SciPy 2024: Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars
Tips for effective communication in open source
Tips for effective communication in open source
The original version of this post appears on blog.coiled.io Coiled Functions make it easy to improve performance and reduce costs by moving your computations next to your cloud data It’s common practice for data scientists and researchers to analyze data on their local work computer (often a laptop). This works great for data that’s stored locally on their machine. However, increasingly data is moving to cloud storage system like AWS S3 and Google Cloud Storage....
This tutorial was given with Naty Clementi, Julia Signell, and Charles Blackmon-Luca.
The original version of this post appears on blog.coiled.io Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud. One of the most basic things programmers do is print text to their screen. Printing is often used for things like debugging: # ... print("Made it here...") # ... or to signal progress: for i in range(10): print(f"Done with iteration {i}") However, when running code at scale on a Dask cluster even simple print calls can become non-intuitive....
The original version of this post appears on blog.coiled.io Dask has deep integrations with other libraries in the PyData ecosystem like NumPy, pandas, Zarr, PyArrow, and more. Part of providing a good experience for Dask users is making sure that Dask continues to work well with this community of libraries as they push out new releases. This post walks through how Dask maintainers proactively ensure Dask continuously works with its surrounding ecosystem....