PyData
PyData
  • 3 531
  • 13 511 775
Jakub Hettler - Jupyter(Hub/Lab): Journey from On-prem to AWS [PyData Prague #18]
Let’s have a look at how we @AlmaCareer Czechia Business Intelligence team moved JupyterHub and JupyterLab from on-premise infrastructure to AWS. Why we used Amazon Sagemaker Studio for just 3 weeks and why we are happy with Jupyter running on top of Coder (coder.com) in AWS at the end. Infrastructure point of view with deeper dive into pros/cons of on-prem JupyterHub/Lab on Hashicorp Nomad, Amazon Sagemaker Studio and Coder. All this considering the requirements of 20 working users in JupyterLab.
Presented at PyData Prague #18 - A Vector from Lab to Hub (29.2.2024 at Pure Storage)
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our UA-cam videos to help with discoverability? Find out more here: github.com/numfocus/UA-camVideoTimestamps
Переглядів: 60

Відео

James Powell - Are generator-coroutines really the answer? | PyData London 2024
Переглядів 907День тому
www.pydata.org As we all know (or, at least, as I've been trying to tell everyone,) generators in Python are an extremely powerful API design technique. A generator represents the linear decomposition of a single computation into multiple parts, and such decomposition proves very useful in practice. For example, we can model an infinite computation and only execute the portions we desire. Very ...
Dan Gibson - An Introduction to Retrieval Augmented Generation - PyData London 2024
Переглядів 439День тому
How do you build chatbots that answer questions using your organisation's data? The answer is Retrieval Augmented Generation (RAG). In this session you'll be introduced to RAG and build a simple RAG powered chatbot in Python. Until very recently, if an organisation wanted a bespoke chatbot application, they had to spend millions of pounds and fund highly specialised teams, often training and ho...
Dr. Adam Hill - Empower Your Projects with Prefect's Pipeline Magic | PyData London 2024
Переглядів 498День тому
Dr. Adam Hill : Mastering Data Flow: Empower Your Projects with Prefect's Pipeline Magic | PyData London 2024 Embark on a transformative journey into the realm of data engineering with our 90-minute workshop dedicated to Prefect 2. In this hands-on session, participants will learn the ins and outs of building robust data pipelines using the latest features and enhancements of Prefect 2. From da...
Issac Godfried - Multimodal Deep Learning in the Real World | PyData London 2024
Переглядів 285День тому
Many real world business problems are multi-modal in nature and would benefit from using a combination of text, imagery, audio, and numerical data. Recently, there has been a surge in powerful deep learning models that fuse multiple modalities of data, however, fine-tuning, deploying, and versioning these models remains challenging for most companies. This tutorial will discuss some of the late...
Lex Avstreikh & Raymond Cunningham - Real-time AI Lakehouse | PyData London 2024
Переглядів 231День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData In this tutorial we will build a AI system to assist you in finding the best bar for you to go to in London - maybe even this evening after the PyData conference. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum ...
Nick Radcliffe - Test-Driven Data Analysis in Python | PyData London 2024
Переглядів 909День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Test-driven data analysis is a methodology and open-source Python library for improving quality in data processes. It covers three main areas: • Testing data (generating constraints and using them to validate new data) PyData is an educational program of NumFOCUS, a 501(c)3 non-profit org...
Sultan Al Awar - Generating Customers Insights with Topic Modelling and HuggingFace SetFit Method
Переглядів 235День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Stop data skimming and dive deep into your customer voices! Are you working with a load of unstructured reviews and you would like to gain an understanding on what customers are commenting about? This hands-on tutorial equips you with powerful text analysis techniques to unlock hidden ins...
Marco Gorelli - How you (yes, you!) can write a Polars Plugin | PyData London 2024
Переглядів 621День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practicall...
Andy Terrel & Jacob Tomlinson - GPU development in Python 101 | PyData London 2024
Переглядів 197День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Since joining NVIDIA I’ve gotten to grips with the fundamentals of writing accelerated code in Python. I was amazed to discover that I didn’t need to learn C and I didn’t need new development tools. Writing GPU code in Python is easier today than ever, and in this tutorial, I will share w...
Kehinde Richard Ogunyale - Graph Database and Retrieval Augmented Generation | PyData London 2024
Переглядів 292День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData In the era of large language models (LLMs), the integration of external, structured knowledge bases has emerged as a frontier for enhancing AI's textual comprehension and generation capabilities. The Retrieval-Augmented Generation (RAG) architecture represents a pivotal advancement in thi...
Datta & Rodríguez - Building the composable Python data stack with Kedro & Ibis | PyData London 2024
Переглядів 179День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. However, now Ibis can provide the sam...
Fonnesbeck & Wiecki- Probabilistic Programming and Bayesian Computing with PyMC | PyData London 2024
Переглядів 295День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Bayesian statistical methods provide powerful tools for solving various data science problems. The Bayesian approach yields easy-to-interpret results and automatically accounts for uncertainty in our estimates or predictions. Although computational challenges have historically been an obs...
Keynote: Dr. Rebecca Bilbro - Mistakes were made: Data science ten years in | PyData London 2024
Переглядів 249День тому
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through th...
Ines Montani - A practical guide to human-in-the-loop distillation | Pydata London 2024
Переглядів 331День тому
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData As the field of natural language processing advances and new ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. Large Language Models (LLMs) have enormous potential, but also challenge existing wor...
Colombo et al. - Building Multi-Agent Generative-AI Applications with AutoGen | Pydata London 2024
Переглядів 142День тому
Colombo et al. - Building Multi-Agent Generative-AI Applications with AutoGen | Pydata London 2024
Patrick Hoefler - Dask DataFrame 2.0: Comparison to Spark, DuckDB and Polars | PyData London 2024
Переглядів 162День тому
Patrick Hoefler - Dask DataFrame 2.0: Comparison to Spark, DuckDB and Polars | PyData London 2024
Noe Achache - RAG for a medical company: the technical and product challenges | Pydata London 2024
Переглядів 145День тому
Noe Achache - RAG for a medical company: the technical and product challenges | Pydata London 2024
Hendrik Makait - Observability for Dask in Production | Pydata London 2024
Переглядів 74День тому
Hendrik Makait - Observability for Dask in Production | Pydata London 2024
Emeli Dral - How continuous testing keeps your LLM on track | Pydata London 2024
Переглядів 162День тому
Emeli Dral - How continuous testing keeps your LLM on track | Pydata London 2024
Keynote: Dr. Matthew Crooks - Data: Faithful or Traitor? | PyData London 2024
Переглядів 75День тому
Keynote: Dr. Matthew Crooks - Data: Faithful or Traitor? | PyData London 2024
Cas Wognum - Using Zarr for drug discovery datasets in Polaris | PyData London 2024
Переглядів 99День тому
Cas Wognum - Using Zarr for drug discovery datasets in Polaris | PyData London 2024
Carlos Samey - Linear Programming for Resource Allocation | PyData London 2024
Переглядів 186День тому
Carlos Samey - Linear Programming for Resource Allocation | PyData London 2024
Hajime Takeda- How to Enhance Customer Targeting in Marketing - Pydata London 2024
Переглядів 247День тому
Hajime Takeda- How to Enhance Customer Targeting in Marketing - Pydata London 2024
Jim Dowling - Function Calling for LLMs | PyData London 2024
Переглядів 206День тому
Jim Dowling - Function Calling for LLMs | PyData London 2024
Adam Glustein - Enabling real-time insights through stream processing in Python | PyData London 2024
Переглядів 307День тому
Adam Glustein - Enabling real-time insights through stream processing in Python | PyData London 2024
Luca Baggi - Uncertainty estimation at scale with functime | PyData London 2024
Переглядів 97День тому
Luca Baggi - Uncertainty estimation at scale with functime | PyData London 2024
Chris Wilkin - Growing user engagement with RL-driven personalisation | PyData London 2024
Переглядів 130День тому
Chris Wilkin - Growing user engagement with RL-driven personalisation | PyData London 2024
Alex Owens - What a serverless database means for users | PyData London 2024
Переглядів 146День тому
Alex Owens - What a serverless database means for users | PyData London 2024
Keynote: Tania Allard - The art of building and sustaining successful OSS tools and infrastructure
Переглядів 79День тому
Keynote: Tania Allard - The art of building and sustaining successful OSS tools and infrastructure

КОМЕНТАРІ

  • @masonholcombe3327
    @masonholcombe3327 День тому

    smoothing having a similar closed form solution as ridge regression is so satisfying

  • @Sidsel-zo5iz
    @Sidsel-zo5iz День тому

    Lo spirito di generosità e collaborazione qui è incoraggiante. Ci ricorda che siamo più forti insieme.😻

  • @nicpetit318
    @nicpetit318 День тому

    12:14 😬

  • @kirill-markin
    @kirill-markin День тому

    🔥

  • @LucelDaSilva
    @LucelDaSilva 2 дні тому

    It's time to dueel🎉🎉🎉

  • @KhalilMuhammad
    @KhalilMuhammad 2 дні тому

    Lovely presentation, Jimmy

  • @azizxojaxusanov5455
    @azizxojaxusanov5455 2 дні тому

    Or where can I get the pptx

  • @azizxojaxusanov5455
    @azizxojaxusanov5455 2 дні тому

    Hello . Can you give me the pptx, your topic is interesting, I wanted to get to know it

  • @_ue_la_
    @_ue_la_ 2 дні тому

    Waouh😭❤️❤️

  • @PapiJack
    @PapiJack 2 дні тому

    Thanks for the video. I will hep next time to have the pointer on the screen as well. It's hard to follow sometimes because we can't see where are you pointing.

  • @moose304
    @moose304 2 дні тому

    90% of my dev work is on Windows and I switched to uv ~6mos ago. Favorite package manager so far, but I also don't create packages for Pypi. I've also always avoided conda after trying it and it just feeling very "heavy." But to each their own, use what works for ya. Nice entertaining talk! 👍

  • @graziellakelisamey4950
    @graziellakelisamey4950 3 дні тому

    Proud of you Mr SAMEY very good presentation and well explained!

  • @ivannz01
    @ivannz01 3 дні тому

    the title of the video (“achieving concurrency in streamlit”) doesn’t match the title of the talk and the description underneath (“open source leadership”)

  • @labeeb_ibrahim
    @labeeb_ibrahim 3 дні тому

    The code repo please.

  • @SerapioSergiovich
    @SerapioSergiovich 3 дні тому

    Nice video shows methods to create a business..

  • @SerapioSergiovich
    @SerapioSergiovich 3 дні тому

    Great method to viralize contents.

  • @chanebenchantre5055
    @chanebenchantre5055 4 дні тому

    Good presentation ❤

  • @fburton8
    @fburton8 4 дні тому

    Well, that was meaty!

  • @fmind-dev
    @fmind-dev 4 дні тому

    Great talk Emili. Looking forward to testing these new features.

  • @SerapioSergiovich
    @SerapioSergiovich 4 дні тому

    Nice video shows methods to develop a commercial startup.

  • @SerapioSergiovich
    @SerapioSergiovich 4 дні тому

    Nice video shows methods to create a business.

  • @FeverBonus
    @FeverBonus 4 дні тому

    What about LLMs

  • @fmind-dev
    @fmind-dev 4 дні тому

    Great talk. I will recommend it to newcomers in AI/ML forecasting.

  • @altvaro
    @altvaro 4 дні тому

    grande Feregrino 🙌

  • @herewegoagain2
    @herewegoagain2 4 дні тому

    Aren't the constraints restrictive in the sense they're univariate? They're definitely helpful but not exhaustive

    • @herewegoagain2
      @herewegoagain2 4 дні тому

      Most practical model 'failures' are due to relationship breakdowns even if they stay within individual constraints. I understand this library isn't meant to be a drift detection library but I think the current setup would work great with that use case

  • @sofdff
    @sofdff 5 днів тому

    Always good to see new players in python streaming ecosystem!

  • @polyagent
    @polyagent 5 днів тому

    Amazing overview of multi-modal landscape from the practitioner point of view.

  • @dimadem
    @dimadem 5 днів тому

    very useful speech!

  • @vitalizzare
    @vitalizzare 5 днів тому

    0:00 Introduction 2:10 Workflow >> Tools 2:47 Make's design decisions 5:37 Use Git 6:06 Use Virtual Environments 10:52 Virtual environment name == git repo name 11:34 Check-in your virtual environments 12:00 enviornment.yml 13:31 Script your environment management with "make" 13:57 Makefile 15:30 Never install packages manually 16:58 Use Auto Documentation 19:55 Separate "what you want" from "what you need" 22:28 Don't be afraid to "nuke it from orbit" 23:49 Summary 25:47 Q&A

  • @hadianasliwa9161
    @hadianasliwa9161 6 днів тому

    Learned a lot of paradoxes from Allen!

  • @GuorongLi-re7kt
    @GuorongLi-re7kt 6 днів тому

    One question, if we get overfitting propensity scores, then the overlap we want will be very small. It looks like conflict arguments here.

  • @RoulDukeGonzo
    @RoulDukeGonzo 9 днів тому

    Any idea why the GPU version of this method can't take a pre-computed distance matrix?

  • @prathameshyeole4566
    @prathameshyeole4566 10 днів тому

    Can you please tell me if I want the output as a percentage change, that is If I am varying my input parameter with a 1% change then by how much will my output change using the any of the analysis method that is Sobol or morris?

    • @HumbertoDaSilvaSantos-pz2nu
      @HumbertoDaSilvaSantos-pz2nu 9 днів тому

      Hello Dear, hope you are doing well. Reading your question, to me that sounds more to a local sensitivity analysis (LSA) related problem, which is even much simpler to run. In the case of Sobol or Morris methods, one usually is dealing with Global Sensitivity Analysis (GSA), and the investigation relies mainly on the contributions of model input uncertainty to model output uncertainty. In your case you can use Matrix Perturbation theory (The Computational Structure of Life Cycle Assessment, by Heijungs and Sangwon), or even simpler, you can run OAT One-factor-at-a-time by choosing one input, varying it by 1% while the remaing inputs are constant, run the model and see the effect of this 1% on the model output interested. Than repeat the procedure for the remaing parameters. I hope it helps. If you want to increase the reliability of your model, then move forward to Morris and Sobol, and for this you need an statistical analysis of the input because it is necessary the ranges (standard deviation) to see the variability of the input.

    • @itssharavariyeole2712
      @itssharavariyeole2712 9 днів тому

      Thank you, I will apply this

  • @NaveenSiddareddy
    @NaveenSiddareddy 10 днів тому

    really neat .. embedded graph.. pretty much creating a graph data struct for programming lang

  • @SerapioSergiovich
    @SerapioSergiovich 11 днів тому

    Nice video shows methods to create a business..

  • @SerapioSergiovich
    @SerapioSergiovich 11 днів тому

    Nice video shows methods to create a business..

  • @oniricosoy
    @oniricosoy 13 днів тому

    very inspiring 😄

  • @0MVR_0
    @0MVR_0 14 днів тому

    clustering is highly driven by the formatting of how the data relates to itself and is near impossible to accomplish using a single method of approach.

    • @RoulDukeGonzo
      @RoulDukeGonzo 9 днів тому

      Agree, but in practical terms, where do you start?

    • @0MVR_0
      @0MVR_0 8 днів тому

      @@RoulDukeGonzo An intimate descriptive knowledge of the data is recommended.

  • @xcimpe
    @xcimpe 14 днів тому

    is the notebook available online?

  • @johannes_81
    @johannes_81 15 днів тому

    Wow - awesome!!!

  • @mohamedibrahim1836
    @mohamedibrahim1836 15 днів тому

    Bayesian do over-fit a maximum entropy prior will tend to over-fit and encode all the sample noise:)

  • @UtkarshKoppikar
    @UtkarshKoppikar 15 днів тому

    Great talk!

  • @monamichoudhary7160
    @monamichoudhary7160 18 днів тому

    Thanks for sharing such insightful video, could you please link in the jupyter notebook and the additional notes?

  • @danielthompson2561
    @danielthompson2561 19 днів тому

    I wish it could lazy scan from a database - the lazy frame function is excellent, but for data security reasons, I’m working from secure database and not parquet files.

  • @vccuong1
    @vccuong1 20 днів тому

    Need to slow down.

  • @salarshahryari4843
    @salarshahryari4843 21 день тому

    Can we train a deep learning in an incremental way?

  • @hannahnelson4569
    @hannahnelson4569 22 дні тому

    A very impressive presentation and algorithm! Thank you for teaching all this!

  • @jp_morr
    @jp_morr 22 дні тому

    Could you possibly make the presenters display a bit smaller, the text is a bit too big and preventing me from admiring the Escher style background

  • @javierbenito2150
    @javierbenito2150 23 дні тому

    Nice seing "the industry" apply vertical scaling stuff that has been done in games (for world data) for at least the last 20-30 years (since cache fetch fail is order of magnitudes more expensive than a cache normal read and SIMD/MMX intructions are available on x86. I thought that compilers took care of these processor "bowels" stuff since thewlast two decades, and probably they did until now. It seems today the data sets are so huge that we must micromanage memory access and vertical paralelism explicitelly again. For a litle while at least. So it in hindsight, explicit optimisation of code data access patterns was outrunned by processor might, and then was further rendered obsolete by multi-core CPUs: Iin theory, of course, because good usage of parallelism is still in its infancy. But now it seems data set size growth has outrunned CPU growth 🤣 Low level 3D engine game programmers, you have a whole new market opening!

  • @dmytrooliinyk3083
    @dmytrooliinyk3083 24 дні тому

    That's a great talk!