David Chudzicki

David Chudzicki

Director of Product Engineering at Coiled

    Bio Coming Soon!

    All Sessions by David Chudzicki

    Day 3 04/25/2024
    12:30 pm - 12:40 pm

    Spark, Dask, DuckDB, Polars: TPC-H Benchmark Results

    <span class="etn-schedule-location"> <span class="firstfocus">Machine Learning</span> </span>

    Large scale dataframe computations are critical for efficient and friendly data manipulation at scale. This space has blown up recently and there are many new choices. In this talk we run major contenders (Spark, Dask, DuckDB, Polars) through the TPC-H benchmarks both locally and on the cloud at various scales ranging from 10GB to 10TB and see how they perform. This will teach us both about these specific libraries and also about how to measure and think through performance on the cloud. We'll think through topics like IO bandwidth, CPU saturation, memory constraints, as well as challenges in deployment and hardware selection. We'll bring in hardware and networking costs to get a sense for overall cost efficiency in computation. The presenters are biased towards Dask, so we'll use that project to dive a bit deeper into tuning and what's critical, but the overall results should be broadly interesting to anyone in the data infrastructure space.

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google