Readit News logoReadit News
Posted by u/eallam 6 months ago
Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps
Hi HN, I’m Eric, CTO at Trigger.dev (https://trigger.dev). We’re a developer platform for building and running AI agents and workflows, open-source under the Apache 2.0 license (https://github.com/triggerdotdev/trigger.dev).

We provide everything needed to create production-grade agents in your codebase and deploy, run, monitor, and debug them. You can use just our primitives or combine with tools like Mastra, LangChain and Vercel AI SDK. You can self-host or use our cloud, where we take care of scaling for you. Here’s a quick demo: (https://youtu.be/kFCzKE89LD8).

We started in 2023 as a way to reliably run async background jobs/workflows in TypeScript (https://news.ycombinator.com/item?id=34610686). Initially we didn’t deploy your code, we just orchestrated it. But we found that most developers struggled to write reliable code with implicit determinism, found breaking their work into small “steps” tricky, and they wanted to install any system packages they needed. Serverless timeouts made this even more painful.

We also wanted to allow you to wait for things to happen: on external events, other tasks finishing, or just time passing. Those waits can take minutes, hours, or forever in the case of events, so you can’t just keep a server running.

The solution was to build and operate our own serverless cloud infrastructure. The key breakthrough that enabled this was realizing we could snapshot the CPU and memory state. This allowed us to pause running code, store the snapshot, then restore it later on a different physical server. We currently use Checkpoint Restore In Userspace (CRIU) which Google has been using at scale inside Borg since 2018.

Since then, our adoption has really taken off especially because of AI agents/workflows. This has opened up a ton of new use cases like compute-heavy tasks such as generating videos using AI (Icon.com), real-time computer use (Scrapybara), AI enrichment pipelines (Pallet, Centralize), and vibe coding tools (Hero UI, Magic Patterns, Capy.ai).

You can get started with Trigger.dev cloud (https://cloud.trigger.dev), self-hosting (https://trigger.dev/docs/self-hosting/overview), or read the docs (https://trigger.dev/docs).

Here’s a sneak peek at some upcoming changes: 1) warm starts for self-hosting 2) switching to MicroVMs for execution – this will be open source, self-hostable, and will include checkpoint/restoring.

We’re excited to be sharing this with HN and are open to all feedback!

flippyhead · 6 months ago
Super happy customer here. We've been using trigger.dev on various projects for over a year now. It's been a great experience and awesome to see them grow. I don't know how long it will last, but I regularly get answers to questions from the founders on Discord, often within hours. I am sure there are a bunch of competitors, but we've never really felt the need to even research them as trigger has consistently met our needs (again, across a range of projects) and seems to be anticipating the features we'll need as we AI more and more of our projects. We're cheering for you Trigger team ;)
tao_oat · 6 months ago
How does Trigger compare to tools like Temporal or Restate? If we put aside the AI use case, it seems like the fundamental feature is durable execution, where there are a few other options in the space.
matt-aitken · 6 months ago
The core is a durable execution engine, but there's a lot more needed to build good applications. Like being able to get realtime progress to users, or being able to use system packages you need to actually do the work (like ffmpeg, browsers etc).

Both of them are focused more on being workflow engines.

Temporal is a workflow engine – if you use their cloud product you still have to manage, scale, and deploy the compute.

With Temporal you need to write your code in a very specific way for it work, including working with the current time, randomness, process.env, setTimeout… This means you have to be careful using popular packages because they often using these common functions internally. Or you need to wrap all of these calls in side effects or activities.

Restate is definitely simpler than Temporal, in a good way. You wrap any code that's non-deterministic in their helpers so it won't get executed twice. I don't think you can install system packages that you need, which has been surprisingly important for a lot of our users.

scottydelta · 6 months ago
I have used Temporal in the past in my previous job and currently using Prefect. All are similar in terms of durable and traced executions but this one seems to be tailored towards AI use cases where as others are more general.

I haven't tried Trigger, planning to give it a spin this weekend!

paduc · 6 months ago
We're very satisfied customers since January of this year.

We use it as an extension of our node app, for all things asynchronous (long or short). The fact that it's the same codebase on our server and trigger cloud is a huge plus.

For me, it's the most accessible incarnation of serverless. You can add it to your stack for one task and gradually use it for more and more tasks (long or short). Testing and local development is easy as can be. The tooling is just right. No complex configurations. You can incrementally use the queuing, wait points, batch triggers for more power.

We've had some issues with migrating from v3 to v4. The transition felt rushed (some of the docs / examples are still showing v3 code, that is deprecated in v4). I understand that it might take some time to update the docs and examples, because there is a lot of content.

eallam · 6 months ago
That's great to hear, and thanks for the kind words.

Sorry you had some issues migrating. You're right, it was our biggest docs update so far, and unfortunately a few things did get missed which we have (hopefully) since rectified. Please do let us know if there's anything else we missed and we'll get it sorted.

jumski · 6 months ago
Congrats on the launch - CRIU snapshot/restore is very cool, especially for data-heavy pipelines like video processing.

Question: is a first-class Supabase/Postgres integration on the roadmap so we can (a) start Trigger jobs from SQL functions and (b) read job status via a foreign data wrapper? That "SQL-native job control (invoke from SQL, query from SQL)" path would make Trigger.dev feel native in Supabase apps.

Disclosure: I'm building pgflow, a Postgres-first workflow/background jobs layer for Supabase (https://pgflow.dev).

selinkocalar · 6 months ago
The reliability piece is interesting - we've seen AI apps fail in ways that are hard to predict or reproduce. Traditional monitoring doesn't really work when your 'bugs' are stochastic. How are you handling failure modes that only show up statistically? Like when a model starts giving subtly wrong answers 2% of the time after a deployment.
scottydelta · 6 months ago
This looks great, I wish I had discovered it 4 months ago. I had to build entire coordination of Prefect with Django app in https://listingheroai.com

Listing hero allows ecom brands to generate consistent templated infographics so I reinvented all these things via data share between Django, Celery processes, Prefect, and webhooks. Users can start multiple generations at the same time and all run in parallel in Prefect and realtime progress visible in frontend via webhooks.

I will try playing with Trigger next weekend and probably integrate with a static stack like cloudflare worker. Excited to try it out!

weego · 6 months ago
This looks really interesting, congrats!

One thing I did notice though from looking through the examples is this:

Uncaught errors automatically cause retries of tasks using your settings. Plus there are helpers for granular retrying inside your tasks.

This feels like one of those gotchas that is absolutely prone to benign refactoring causing huge screwups, or at least someone will find they pinged a pay for service 50x by accident without realising.

ergonomics like your helper of await retry.onThrow feel like a developer friendly default "safe" approach rather than just an optional helper, though granted it's not as magic feeling when you're trying convert eyeballs into users.

matt-aitken · 6 months ago
Yep you do need to be careful with uncaught errors.

When you setup your project you choose the default number of retries and back-off settings. Generally people don't go as high as 50 and setup alerts when runs fail. Then you can use the bulk replaying feature when things do wrong, or if services you rely on have long outages.

I think on balance it is the correct behaviour.