categories of startups that will be affected by these launches:
- vectorDB startups -> don't need embeddings anymore
- file processing startups -> don't need to process files anymore
- fine tuning startups -> can fine tune directly from the platform now, with GPT4 fine tuning coming
- cost reduction startups -> they literally lowered prices and increased rate limits
- structuring startups -> json mode and GPT4 turbo with better output matching
- vertical ai agent startups -> GPT marketplace
- anthropic/claude -> now GPT-turbo has 128k context window!
That being said, Sam Altman is an incredible founder for being able to have this close a watch on the market. Pretty much any "ai tooling" startup that was created in the past year was affected by this announcement.
For those asking: vectorDB, chunking, retrieval, and RAG are all implemented in a new stateful AI for you! No need to do it yourself anymore. [2] Exciting times to be a developer!
[1] https://youtu.be/smHw9kEwcgM
[2] https://openai.com/blog/new-models-and-developer-products-an...
To make matters worse, the FDA limits the amount of potassium that can be present in supplements to 100mg[2]. So good luck taking 30 supplements to meet your daily requirements!
One option Id like to advertise is salt alternatives at grocery stores which are filled with potassium, some with at least 800mg per tsp. This can be another way to supplement potassium and magnesium in the diet [2]
[1] https://ods.od.nih.gov/factsheets/Potassium-HealthProfession...
[2] https://www.health.harvard.edu/staying-healthy/should-i-take...
More generally we can embed the transformation logic of each stage of your data pipelines into the edge between nodes (like two columns). Like you said, in the case of SQL there are lots of ways to statically analyze that pipeline but it becomes much more complicated with something like pure python.
As an intermediate solution you can manually curate data contracts or assertions about application behavior into Grai but these inevitably fall out of sync with the code.
Airflow has a really great API for exposing task level lineage but we've held off integrating it because we weren't sure how to convert that into robust column or field level lineage as well. How are y'all handling testing / observability at the moment?
- we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately.
- we have CI jobs and unit tests that are run on all pipelines
Observability:
- we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips.
- Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that.
- we have a service that essential surfaces observability metrics for pipelines in a nice ui
- our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail.
We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things.
Happy to talk more about this if you're interested.
Any plans to support airflow in the future? Would love to have something like this for our companies 500k+ airflow jobs.
Excerpt [1]:
> Honeybees accumulate an electric charge during flying. Bees emit constant and modulated electric fields during the waggle dance. Both low- and high-frequency components emitted by dancing bees induce passive antennal movements in stationary bees. The electrically charged flagella of mechanoreceptor cells are moved by electric fields and more strongly so if sound and electric fields interact. Recordings from axons of the Johnston's organ indicate its sensitivity to electric fields. Therefore, it has been suggested that electric fields emanating from the surface charge of bees stimulate mechanoreceptors and may play a role in social communication during the waggle dance.
I'm not sure about the time tracking though. Is this more for people working on contract for billing? I see the value in having the data but collecting the data seems difficult.
[1] https://help.obsidian.md/Plugins/Daily+notes