madman2890 (u/madman2890)

madman2890 commented on Why Most Valuable AI Systems Are Still Tabular Models · Posted by u/madman2890

amazonbezos · 7 days ago

totally agree

madman2890 · 7 days ago

with all of it? wow :)

madman2890 commented on Global warming has accelerated significantly researchsquare.com/articl... · Posted by u/morsch

madman2890 · 10 days ago

We need the government to release the alien technology to the public now.

madman2890 commented on Tabular data is the frontier – graphs can help · Posted by u/madman2890

amazonbezos · 11 days ago

This is definitely where most of the time is spent - very cool!

madman2890 · 11 days ago

Glad you find it useful.

madman2890 commented on Tabular Foundation Models Still Need One Thing: Multi-Table Aggregation wesmadrigal.github.io/Gra... · Posted by u/madman2890

madman2890 · 11 days ago

Tabular foundation models like TabPFN and related work are extremely promising. They’re starting to show strong results on many classical tabular ML benchmarks and can reduce the amount of manual modeling work required from data scientists. However, there is a structural reality of enterprise data that these models don’t remove. Most real-world machine learning problems are not stored in a single clean table. Instead they live across dozens or hundreds of relational tables: orders, customers, events, transactions, shipments, products, logs, etc. Each table captures part of the signal, often with one-to-many relationships, time dependencies, and high cardinality entities. Before any tabular model can be trained, those signals have to be integrated. In practice this means: Traversing relational graphs of tables Aggregating child tables to parent entities Handling time windows and temporal leakage Collapsing many-to-many relationships into meaningful features Producing a single wide training dataset This step is usually the most time-consuming part of the entire ML workflow. Even if the model itself becomes automated via a tabular foundation model, the data still has to be prepared. This is where GraphReduce comes in. GraphReduce treats the relational database as a graph of entities and relationships. Instead of manually writing large SQL pipelines, the user defines the nodes (tables) and their relationships. GraphReduce then walks the graph and performs the required aggregations automatically, generating a single training dataset.

madman2890 commented on Cognitive load is what matters github.com/zakirullin/cog... · Posted by u/nromiun

madman2890 · 7 months ago

“Smart developer’s quirks” tend to peak in 3-8 years of experience and fade off thereafter. A hipster will never fade off and instead continue hipster coding alongside their identity in perpetuity.