SciPy 2024: Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars

Tips for effective communication in open source

July 11, 2024

Data-proximate Computing with Coiled Functions

The original version of this post appears on blog.coiled.io Coiled Functions make it easy to improve performance and reduce costs by moving your computations next to your cloud data It’s common practice for data scientists and researchers to analyze data on their local work computer (often a laptop). This works great for data that’s stored locally on their machine. However, increasingly data is moving to cloud storage system like AWS S3 and Google Cloud Storage....

August 8, 2023 · James Bourbeau

SciPy 2023: Advanced Dask Tutorial

This tutorial was given with Naty Clementi, Julia Signell, and Charles Blackmon-Luca.

July 11, 2023

Distributed printing

The original version of this post appears on blog.coiled.io Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud. One of the most basic things programmers do is print text to their screen. Printing is often used for things like debugging: # ... print("Made it here...") # ... or to signal progress: for i in range(10): print(f"Done with iteration {i}") However, when running code at scale on a Dask cluster even simple print calls can become non-intuitive....

May 18, 2023 · James Bourbeau

Upstream testing in Dask

The original version of this post appears on blog.coiled.io Dask has deep integrations with other libraries in the PyData ecosystem like NumPy, pandas, Zarr, PyArrow, and more. Part of providing a good experience for Dask users is making sure that Dask continues to work well with this community of libraries as they push out new releases. This post walks through how Dask maintainers proactively ensure Dask continuously works with its surrounding ecosystem....

April 18, 2023 · James Bourbeau