Readit News logoReadit News
vibecodemaster commented on Metaflow: Build, Manage and Deploy AI/ML Systems   github.com/Netflix/metafl... · Posted by u/plokker
LaserToy · 8 months ago
Cloudkitchens use them as well: https://techblog.cloudkitchens.com/p/ml-infrastructure-doesn...

They call it a DREAM stack (Daft, Ray Engine or Ray and Poetry, Argo and Metaflow)

vibecodemaster · 8 months ago
There's actually a lot of companies using Metaflow, big and small: https://outerbounds.com/stories
vibecodemaster commented on Metaflow: Build, Manage and Deploy AI/ML Systems   github.com/Netflix/metafl... · Posted by u/plokker
lazarus01 · 8 months ago
I went to the GitHub page. The descriptions of the service seem redundant to what cloud providers offer today. I looked at the documentation and it lacks concrete examples for implementation flows.

Seems like something new to learn, an added layer on top of existing workflows, with no obvious benefit.

vibecodemaster · 8 months ago
> redundant to what cloud providers offer today

It may look redundant on the surface, but those cloud services are infrastructure primitives (compute, storage, orchestration). Metaflow sits one layer higher, giving you a data/model centric API that orchestrates and versions the entire workflow (code, data, environment, and lineage) while delegating the low‑level plumbing to whatever cloud provider you choose. That higher‑level abstraction is what lets the same Python flow run untouched on a laptop today and a K8s GPU cluster tomorrow.

> Adds an extra layer to learn

I would argue that it removes layers: you write plain Python functions, tag them as steps, and Metaflow handles scheduling, data movement, retry logic, versioning, and caching. You no longer glue together five different SDKs (batch + orchestration + storage + secrets + lineage).

> lacks concrete examples for implementation flows

there are examples in the tutorials: https://docs.outerbounds.com/intro-tutorial-season-3-overvie...

> with no obvious benefit

There are benefits, but perhaps they're not immediately obvious:

1) Separation of what vs. how: declare the workflow once; toggle @resources(cpu=4,gpu=1) to move from dev to a GPU cluster—no YAML rewrites.

2) Reproducibility & lineage: every run immutably stores code, data hashes, and parameters so you can reproduce any past model or report with flow resume --run-id.

3) Built‑in data artifacts: pass or version GB‑scale objects between steps without manually wiring S3 paths or serialization logic.

u/vibecodemaster

KarmaCake day2July 17, 2025View Original