I built Buckaroo as a data table UI for Jupyter and Pandas/Polars, that first lets you look at the data in a modern performant table with histograms, formatting, and summary stats.
Yesterday I released autocleaning for Buckaroo. This looks at data and heuristically chooses cleaning methods with definite code. This is fast (less than 500ms). Multiple cleaning strategies can be cycled through and you can choose the best approach for your data. For the simple problems we shoudn't need to consult an LLM to do the obvious things.
All of this is open source and extensible.
[1] https://youtube.com/shorts/4Jz-Wgf3YDc
[2] https://github.com/paddymul/buckaroo
[3] https://marimo.io/p/@paddy-mullen/buckaroo-auto-cleaning Live WASM notebook that you can play with - no downloads or installs required
The chat for exploratory data analysis ("what can you tell me about this column I just added?"), the worksheets and column lineage are real game-changers for dbt development. These features feel purposefully designed for how I actually work.
Claire and Christophe are super responsive to feedback, implementing features and fixes quickly. You can see the product evolving in all the right directions!