stsffap (u/stsffap) - Readit News

stsffap commented on A Durable Coding Agent – With Modal and Restate restate.dev/blog/durable-... · Posted by u/stsffap

stsffap · 2 months ago

We built this to explore what it takes to move a coding agent from "works on my laptop" to "handles millions of users in production."

The core insight: durability matters more than you'd think for agents. When an agent takes 5-10 minutes on a task, crashes become inevitable. Rate limits hit. Sandboxes timeout. Users interrupt mid-task. Traditional retry logic gets messy fast.

Our approach uses Restate for durable execution (workflows continue from the last completed step) and Modal for ephemeral sandboxes. We get automatic failure recovery, interruptions for new input, great scalability, and scale-to-zero without any custom retry code. The tradeoffs: coupling to Restate's execution model and requiring discipline around deterministic replay.

How are you handling long-running agent workflows to make them run reliably at scale?

stsffap commented on Keep your applications running while AWS is down restate.dev/blog/geo-repl... · Posted by u/stsffap

stsffap · 2 months ago

Author here. AWS’s recent us-east-1 outage inspired us to share our approach to geo-replication with Restate.

The core idea: geo-replication should be a deployment concern, not something you architect into every line of application code. You write normal business logic, then configure replication policies at deployment time and let Restate handle the rest.

The configuration is straightforward: `default-replication = "{region: 2, node: 3}"` ensures data is replicated to at least 2 regions and 3 nodes. This ensures that your apps can tolerate a region outage or losing two arbitrary nodes while staying fully available. Behind the scenes, Restate handles leader election, log replication, and state synchronization. We use S3 cross-region replication for snapshots with delayed log trimming to ensure consistency.

We tested this with a 6-node cluster across 3 AWS regions under 400 req/s load. Killing an entire region resulted in sub-60-second automatic failover with zero downtime and no data loss. Only 1% of requests saw latency spikes during the failover window. Once nodes in us-east-1 were no longer running, P50 latency increased when replication shifted from nearby us-east-1/us-east-2 to distant us-east-2/us-west-1.

Happy to answer technical questions or discuss tradeoffs!

stsffap commented on Control and Alt and Restate 1.5 restate.dev/blog/announci... · Posted by u/stsffap

stsffap · 3 months ago

Restate 1.5 adds some nice QoL improvements for durable execution:

Observability: Full execution history with live timelines showing retries, nested calls, and state changes. All stored locally in RocksDB, no external deps needed.

Better failure handling: Instead of dead-letter queues, you can now pause invocations that hit terminal errors and restart them via UI once you fix the root cause. Invocations retain their progress/state.

Granular retry policies: Configure retries per service/handler. Invocations can pause after max retries instead of failing (useful for config errors, blocked APIs, etc).

Performance: SQL queries 5-20x faster, making the UI much snappier.

AWS Lambda: Automatic payload compression when approaching the 6MB limit, plus new Rust SDK support.

Also includes a docs overhaul with new tutorials for AI agents, workflows, and microservice orchestration.

Cloud version is now public (managed option), or self-host the open source version.

stsffap commented on Restate Cloud Is Open to Everyone – Build Durable Workflows and Agents Today restate.dev/blog/announci... · Posted by u/stsffap

stsffap · 3 months ago

Restate Cloud is now publicly available with usage-based pricing (free tier: 50k actions/month, no CC required).

Restate provides durable execution for workflows and AI agents - think "transactional guarantees for distributed code." It handles state persistence, automatic retries, and crash recovery so your workflows always complete. Already being used for AI orchestration, payment processing, and banking infrastructure.

The platform combines the dev-ex of cloud-native orchestration with database-level guarantees, while also running as a single binary that scales from localhost to multi-region. New features include detailed execution timelines, client-side encryption with customer keys, and seamless integration with Cloudflare Workers/Vercel/Deno that automatically handles the versioning problem (no more breaking durable executions with code changes).

Open source core + managed cloud offering.

stsffap commented on Restate 1.4: We've Got Your Resiliency Covered restate.dev/blog/announci... · Posted by u/stsffap

stsffap · 6 months ago

We’re excited to announce Restate v1.4, a significant update for developers and operators building and supporting resilient applications. The new release improves cluster resiliency and workload balancing, and also adds a multitude of efficiency and ergonomics improvements across the board. Experience less unavailability and achieve more with fewer resources.