Readit News logoReadit News
bkrausz · 5 years ago
I was responsible for Stripe's API abstractions, including webhooks and /events, for a number of years. Some interesting tidbits:

Many large customers eventually had some issue with webhooks that required intervention. Stripe retries webhooks that fail for up to 3 days: I remember $large_customer coming back from a 3 day weekend and discovering that they had pushed bad code and failed to process some webhooks. We'd often get requests to retry all failed webhooks in a time period. The best customers would have infrastructure to do this themselves off of /v1/events, though this was unfortunately rare.

The biggest challenges with webhooks:

- Delivery: some customer timing out connections for 30s causing the queues to get backed up (Stripe was much smaller back then).

- Versioning: synchronous API requests can use a version specified in the request, but webhooks, by virtue of rendering the object and showing its changed values (there was a `previous_attributes` hash), need to be rendered to a specific version. This made upgrading API versions hard for customers.

There was constant discussion about building some non-webhook pathway for events, but they all have challenges and webhooks + /v1/events were both simple enough for smaller customers and workable for larger customers.

alexbouchard · 5 years ago
Shameless plug but I've built https://hookdeck.com precisely to tackle some of these problems. It generally falls onto the consumer to build the necessary tools to process webhooks reliably. I'm trying to give everyone the opportunity to be the "best customers" as you are describing them. Stripe is big inspiration for the work.
Redsquare · 5 years ago
Do you provide the ability to consume, translate then forward? I am after a ubiquitous endpoint i can point webhooks at and then translate to the schema of another service and send on. You could then share these 'recipes' and allow customers to reuse well known transforms.
rattray · 5 years ago
> We'd often get requests to retry all failed webhooks in a time period.

(I worked on the same team as bkraus, non-concurrently).

For teams that are building webhooks into your API, I'd recommend including UI to view webhook attempts and resend them individually or in bulk by date range. Your customers are guaranteed to have a bad deploy at some point.

Eiriksmal · 5 years ago
At Lawn Love, we naively coupled our listening code directly to the Stripe webhook... but it worked flawlessly for years. I wasn't a big fan of the product changes necessitating us switching from the Transfer API for sending money to the complicated--and very confusing for the lawn pros--Connect product, but its webhooks also ran without issue from the moment we first implemented them. So thanks for making my life somewhat easier, Mr. Krausz.

Like many others, I now pattern my own APIs after Stripe's.

bkrausz · 5 years ago
Don’t fully thank me, I was also the architect of the Transfers API to Connect transition :). There’s a lot I would have done differently there were I doing it again, though much of the complexity (e.g. the async verification webhooks) were to satisfy compliance needs. Hard to say how much easier the v1 could’ve been given the constraints at the time, though I’m very impressed with the work Stripe has done since to make paying people easier (particularly Express).
alexgartrell · 5 years ago
I think the Stripe API stuff you did was fine, but you really did your best work as a concepts of mathematics TA.
ctas · 5 years ago
Can you share a bit about how these events are stored on Stripes backend e.g. Kafka, Postgres?
bastawhiz · 5 years ago
It's all just kafka and mongo. The event can be stored in any simple k/v storage. There's no magic.

Edit: not sure why I'm being downvoted. I work at stripe and this is literally how it works.

spullara · 5 years ago
Pretty easy for a customer to setup an SQS queue and a lambda for receiving them rather than rely on their infrastructure to do all the actual receiving. Way more reliable than coupling your code directly to the callback.
jon-wood · 5 years ago
This is precisely what we do where I work. We have a service which has just one responsibility - receive webhooks, do very basic validation that their legitimate, then ship the payload off to an SQS queue for processing. Doing it this way means that whatever’s going on in the service that wants the data, the webhooks get delivered, and we don’t have to worry about how 3rd party X have configured retries.
tasn · 5 years ago
These reasons are exactly why we started Svix[1] (we do webhooks as a service). I wish we existed to serve you guys back when you started working on it. :)

[1] https://www.svix.com

throwaway290232 · 5 years ago
I always laugh when people end up with designs like this. They could have just used SMTP! It's designed to reliably deliver messages to distributed queues using a loosely-coupled interface while still being extensible. It scales to massive amounts of traffic. It's highly failure-resistant and will retry operations in various scenarios. And it's bi-directional. But it's not "cool" technology or "web-based" so developers won't consider it.

Watch me get downvoted like crazy by all Nodejs developers. Even though they could accomplish exactly what they want with much less code and far less complex systems to maintain.

chaps · 5 years ago
I pitched an idea like this years ago to essentially backfill one ticketing system to shiny new system that could read an email inbox. The idea was that if we dropped an email in that inbox with its desired format for each old ticket's updates, the new system would do all the necessary inserts and voila. They told me no -- not because of any technical reason, but because their email infrastructure was required to be audited by the SEC, they would have opened themselves up to significantly more auditing. Instead, I ended up having to do it through painful, painful SQL.

Lesson being, that sometimes there are unexpected reasons why a specific piece of technology shouldn't be used.

rendall · 5 years ago
The suggestion to use SMTP is interesting.

I didn't downvote you but I bet they come from this part. People don't like this kind of negativity.

> But it's not "cool" technology or "web-based" so developers won't consider it. Watch me get downvoted like crazy by all Nodejs developers.

vidarh · 5 years ago
I actually did use SMTP as queuing middleware for a registrar platform years ago.

It worked very well.

EDIT: To add some context, my team had come off building a webmail platform, and so we'd done lots of interesting stuff to qmail and knew it inside out. We then launched the .name tld and built a model registrar platform that on registration would bring up web and mail forwarding for users that wanted it. We used SMTP to handle the provisioning of those while keeping the registration part decoupled from the servers handling the forwarding. We also used it to live-update a custom DNS server I wrote.

erikpukinskis · 5 years ago
Your suggestion about SMTP is a good one. Disappointing that I had to downvote your comment for the ad hominem on us old Node developers.

Why you need to insult a whole body of people, rather than just make a claim about the technology, I don’t know.

mrzimmerman · 5 years ago
Honestly this is so stupid brilliant I love it (stupid as in I can’t believe I hadn’t considered this). Honestly it really is about storing, sending, and checking messages so SMTP makes so much sense!

I’ve been building for the web for 15 years and it shows how far I can hyper focus on certain communications implementations that I’m not looking at pre-existing options that really meet a large number of use cases. I suppose it also means making sure your data consumers are comfortable working with the protocol but it’s a really top notch idea.

kbenson · 5 years ago
SMTP used to be a lot more reliable than it is now. Now, with all the changes to help with blocking spam, you have to be very careful or have a lot of control over the receiving server to ensure you actually get delivery. Some anti-spam systems will just discard if the matching rules indicate the spam likelihood score is above a certain threshold, and mistakes in rules at system levels can and do happen.

But here's another way you could (ab)use the mail system for delivery, provide a mailbox for the client and just allow IMAP or POP access and throw the messages into that. The client can log in to access and process them (which they would likely be automating on their own mailboxes anyway). It does mean it's housed at the provider, but it's also pretty easy to scale. There's lots of info on how to set up load balanced dovecot clusters out there, and even specialized middleware modes (dovecot director) to make it work better so you can scale it to very very large systems.

wruza · 5 years ago
SMTP would raise too many questions, from how both datacenters tolerate it (spam), to who will manage the receiving server itself and certificates on your side, and overall security of this setup. For a nodejs developer it’s really easier to spin up a separate handmade queue process rather than managing SMTP-related things. Webhook (for runtime) and long-polled /events?since= (for startup) have all upsides with little downsides.
bkrausz · 5 years ago
When designing something like this as a service, the biggest question is what other developers will find easy to use. Every cheap host supports inbound HTTP requests, and most web developers know how to receive them.

Stripe needs to be usable by both the developers building intense, scalable, reliable systems and the people teaching themselves to code in a limited context on a limited platform.

pmelendez · 5 years ago
>And it's bi-directional. But it's not "cool" technology or "web-based" so developers won't consider it.

I might be missing a point or two here, but I don't see how SMTP can work for this case at all. You would require every API consumer to setup a SMTP server (which is another piece of infrastructure to maintain), and then somehow have a layer of authentication so the recipient can control who post messages on that server (overhead for the publisher per new customer). Then we still haven't resolved the issues on the customer side (a bad code that could pop all messages and now we might require the publisher to replay them again).

I haven't even started to think about security and network hardening challenges yet. Again, I might be missing the point but this is not a case of cool tech overuse to me.

dataflow · 5 years ago
I'm confused, "use SMTP" doesn't even type-check for me. Isn't SMTP just a transfer protocol? Meaning it defines a bunch of commands and gives them meanings (like EHLO and DATA and such), just like how HTTP defines commands like GET and POST and all that? Isn't the problem here about e.g. the storage & retry logic rather than about the data transfer itself? Can't you retry transmission as frequently as you like using whatever protocol you like? How does transferring the data over SMTP gain you anything compared to HTTP?
kgwxd · 5 years ago
What about events that need faster than 1 minute response times? Any push notification like system is going to be just as error prone. And what about multiple message handlers? And what happens when the send fails? Did someone write the code to check the inbox for them and handle them? When a send fails multiple times, is that logged and is there a system for clients to check that log? Message transfer isnt the hard problem in this domain.
nijave · 5 years ago
>so developers won't consider it

I think it depends on the developer. There's developers hammering out boring business logic as fast as possible and there's developers with a deep understanding of machine internals, protocols, and infrastructure. For the former, SMTP is black magic they'd probably never think of and involves engaging the one infra person that's always busy

It also means standing up and managing "infrastructure"

user5994461 · 5 years ago
SMTP won't work for the customers.

Developers won't be able to use the existing email systems of the company, too critical and managed by another team. They will never be able to reconfigure it and get API access to read emails. Note that it may or may not be reliable at all (depends on the company and the IT who manages it).

Developers won't be able to setup new email servers for that use case. Security will never open the firewall for email ports. If they do, the servers will be hammered by vulnerability scanners and spam as soon as it's running. Note that large companies like banks run port scanners and they will detect your rogue email servers and shut it down (speaking from experience).

thakoppno · 5 years ago
You have this Nodejs developer’s upvote.

At this point in my career (10 years in the game), let me simply defend node as the tool that got me here. Using it then to bootstrap my career was just as practical as using SMTP as you describe now.

paulddraper · 5 years ago
> SMTP

But... Why?

The HTTP protocol is so much easier to manage, load balance, use, etc.

fny · 5 years ago
So how are people supposed to consume this? With an SMTP client?

I think the bigger issue is that consumption isn't particularly friendly. Also, you still haven't solved the versioning issues.

jmiserez · 5 years ago
There are better options than SMTP. Basically any message-oriented middleware / message queuing service can provide this. It's great for both sides, maintenance/outages can happen independently, as long as the queue stays online and has space everything is fine.
johannes1234321 · 5 years ago
E-Mail isn't trustworthy. You may get a confirmation that an initial SMTP server accepted a mail, but that's it. There's also no good way to detect that an endpoint (receiver address) is gone for good to stop sending messages.

You will probably point me to SMTP success messages, but a removed mailbox might only be known by a backend server.

Also mail infrastructure will potentially include heavy spam filters etc. making it quite inconvenient. Not even mentioning security aspects with limited availability of transport layer encryption with proper signatures.

go_prodev · 5 years ago
I think that would be a great solution for these types of scenarios.

In an enterprise setting it becomes more complex if a 365 subscription is required, or active directory authentication is needed to receive emails. Does someone need to monitor the inbox to confirm it's working etc.

But after you mentioned it, I do wish that this was an alternative to webhooks that more service providers offered.

teh_klev · 5 years ago
We used to do this for domain name registrations and it worked fairly well for years. However once you've been added to a spam blacklist it quickly breaks down, especially for time critical operations such as domain name renewals when you're scrabbling around trying to appease the Spamhaus gods.
oalae5niMiel7qu · 5 years ago
SMTP doesn't reliably deliver messages, implementations of it do. A webshit could easily create an SMTP server (with the help of a library written by someone with actual programming skills) that silently drops messages when any error occurs instead of implementing all that robustness.
daniellarusso · 5 years ago
The very first startup I worked at used this for a sweepstakes leadgen form to send to MySQL via a Perl script running from cron.
alephu5 · 5 years ago
Another option would be to publish an AMQP endpoint, I'm not sure what the security implications of this are though.
NicoJuicy · 5 years ago
And far too slow for a lot of use-cases

Deleted Comment

andyxor · 5 years ago
going down this non-traditional path you might also consider using XMPP and ejabberd for machine-to-machine messaging
aidenn0 · 5 years ago
SMTP no longer reliably delivers messages. Try setting up an MTA on a Hetzner VPS and see how many messages get through

Dead Comment

XorNot · 5 years ago
This is a hill I find myself frequently fighting (and losing on): webhooks are terrible to maintain, because they start from the premise "this never breaks" and thats about where development in an organization stops.

The only event API I ever want is notifications there's new data, and then an interface by which I can query all new data which has arrived by some sort of index marker - because this is fundamentally reliable. It means whatever happens to my system, I can reliably recover missed events, skipped events, or rebuild from previous events.

And this is in fact exactly how something like Kafka actually works! Complete with first-class support for compacting queues to produce valid "summarized" starting points.

Any streaming system essentially should never start as a streaming system - it should start as a slow-path pull-based system, and have a fast-path push system added on top of it if needed - because then you've built your recovery path already, rather then what happens way too often which is just "oh yeah, we'll develop that when it breaks".

sroussey · 5 years ago
I agree. A simple ping with the latest ID (which is option for you to use to get events from last ID to newest ID). The go get the events, which is a likely reusing code. Polling is crap.

Extra points for being able to set something like 1s between pings (now you see why I like the option ID for a range).

delusional · 5 years ago
> Any streaming system essentially should never start as a streaming system - it should start as a slow-path pull-based system, and have a fast-path push system added on top of it if needed - because then you've built your recovery path already, rather then what happens way too often which is just "oh yeah, we'll develop that when it breaks".

I think this is a quite interesting and important point. When we talk about "doing the simple thing first" too often we end building something that is technically simple but fickle. The trick to making the simple thing reliable is to figure out which part is the slow-path (or failure mode), and then only building that. Unfortunately, it often means out result ends up technically "boring" since all the interesting optimizations are what we cut out, but I think that's worth it if the end result is a more useful product.

It's something I've been working with and thinking about for a while. I think it applies to a way broader scope than this discussion.

rattray · 5 years ago
(I worked on the same team as bkrausz, elsewhere on this thread, albeit not concurrently).

Yes, this is pretty much the right thing to do. It can be a bit more work for the API consumer, partly because they need to track state of their last-read ID, and there's more moving parts.

If you're building a webhhook+events system like Stripe's, you might consider adding an option for a mostly-empty webhook body, which can speed things up in this use-case, but still allows "the easy way" of just processing the event from within the webhook body.

(For readers thinking of implementing this, note that "query for new data" means hitting a dedicated /events api, not individual tables, which might have unpleasant load/performance consequences).

danudey · 5 years ago
My company has recently switched to Microsoft Teams, where unsupported integrations happen via webhooks. For example, if we wanted to be able to trigger builds in Jenkins or Gitlab, or acknowledge alerts via AlertManager, we'd have to set them up as webhooks to the appropriate service.

The problem is that all of those services are internal to our network, and aren't accessible from the outside world. We cannot set up a webhook to Jenkins because Jenkins does not have a publicly accessible URL. We cannot set up a webhook to Gitlab, or to Prometheus, or to Sentry, or anything else, because those are all internal services.

The only option there would be to create a new, public-facing server, set it up with a domain name and SSL certificate, expose it to the world, and then give it access to those services - which defeats the point of having those services internal and secure if we just create a non-internal system and give it access to them.

Alternately, we have that new, public-facing server buffer those requests and have other services poll them, somehow, so that it cannot connect in, but now we're getting into the same situation as described in the article.

If there were an API, I could easily create a small daemon that would watch for events and dispatch them accordingly, and then respond to them as needed; instead, my only option is to build some kind of Frankenstein - or to give up entirely, which is the more reasonable solution.

Then again, this is Microsoft Teams, where creating an application requires an Azure account and jumping through a ton of hoops, so they're no stranger to stupid ideas that no one wants to deal with.

pjgalbraith · 5 years ago
If you are using Teams and use Azure AD then something like Azure AD Application Proxy might be a good option https://docs.microsoft.com/en-us/azure/active-directory/app-...
carlosf · 5 years ago
+1

My company's internal apps use a mix of VPNs and IP fenced load balancers. We are migrating to app proxy.

No inbound connections + access based on Azure AD identity with conditional access (restrict apps to Intune enabled corporate devices) and MFA is an absolute killer.

My only complain is that connectors are not very DevOps friendly. Cloudflare Tunnel is much better in this area.

graton · 5 years ago
You have the same sort of issue that I do.

You might look into Cloudflare Tunnel (formerly Argo). It is free and allows you to poke a hole in your firewall to a specific service. If that meets your security requirements.

https://www.cloudflare.com/products/tunnel/

jffry · 5 years ago
I don't believe Cloudflare Tunnel is free, the free tier pricing page [1] lists Argo Smart Routing at "Starting at $5 per month" ("Argo includes: Smart Routing, Tunnel, and Tiered Caching")

[1] https://www.cloudflare.com/plans/

Dead Comment

bastawhiz · 5 years ago
> The only option there would be to create a new, public-facing server

This is a problem with receiving any inbound data from a third party. At least with HTTP, it's pretty trivial to set up a robust reverse proxy with nginx.

oscargrouch · 5 years ago
I'm finishing a browser based application platform where the applications installed expose a RPC api, so in the end all applications can call others in the same local(or remote) node/s.

The beauty of this is that you also can compose with other nodes and for a distributed service by calling the local service as a proxy and routing the requests to the other nodes of the same api.

It took more time than i've predicted because its also expected to deliver UI and most of the 'HTML5' api to native applications (instead of Javascript), which is a massive platform by now (and the #1 reason why newcomers to browser technology cant compete, giving the feature creep tax imposed to them).

The idea is also to distribute over a DHT so you can just serve your application over torrent without needing to register anything..

The only way to get there is by empowering users and developers and taking some of the control from the cloud platform giants.

In my point of view the only way to break the browser monopoly now is to create a new path forward, a branch.. its not the time to follow the rules, its time to break them or else the future doesn't look so bright in my opinion..

BeefWellington · 5 years ago
> The only option there would be to create a new, public-facing server, set it up with a domain name and SSL certificate, expose it to the world, and then give it access to those services - which defeats the point of having those services internal and secure if we just create a non-internal system and give it access to them.

That's not the only solution -- you could also develop a bot that will do those specific things.

In the days of yore I know of at least three companies that were using IRC bots to similar effect long before webhooks ever existed.

Because of that prior experience, this is how I currently manage a similar set of problems, albeit not on Teams in my current role.

ec109685 · 5 years ago
Really good point that corporate firewalls can trip you up. With slack it was so much easier to call into their events API than receive an outgoing webhook for precisely this reason.

The downside was that the event api required a huge amount of scope, so if you weren’t careful and were compromised, someone could use that token to scrape all messages in the system.

Slack recently added socket mode for precisely this reason: https://api.slack.com/apis/connections/socket

iamtheworstdev · 5 years ago
orf · 5 years ago
A small Lambda (or your cloud equivalent) is perfect for this
tabbott · 5 years ago
Zulip's API is built on roughly this design pattern:

* https://zulip.com/api/real-time-events

* https://zulip.com/api/register-queue

* https://zulip.readthedocs.io/en/latest/subsystems/events-sys...

We use this same long-polling based /events API interface for all official clients (web, mobile, terminal), our interactive bots ecosystem (https://zulip.com/api/running-bots), and many integrations (E.g. bridges with IRC/Matrix/etc.).

We also offer webhooks, because some platforms like Heroku or AWS Lambda make it much easier to accept incoming HTTP requests than do longpolling, but the events system has always felt like a nicer programming model.

(Zulip's events system was inspired by separate ~2012 conversations I had with the Meteor and Quora founders about the best way to do live updates in a web application).

Arathorn · 5 years ago
Matrix has the same, except we call it /sync these days rather than /events, and it long-polls :)
paxys · 5 years ago
There are lots of reasons to want to immediately respond to an external event besides building an eventually consistent data syncing system. Polling an API endpoint works fine for the latter case, but not much else.

A good platform should offer both of these and more (for example Slack does webhooks, REST endpoint, websocket-based streaming and bulk exports), and let the client pick what they want based on their use case.

benlivengood · 5 years ago
Long-polling is the way to immediately retrieve events. It's more efficient and lower latency than waiting for a sender to initiate a TCP and TLS handshake.
andrewstuart2 · 5 years ago
A persistent connection has a cost. Your statement may be true in some circumstances but definitely not all. Namely, for infrequent events it is much more efficient to be notified than to be asking nonstop. Sure, the latency is lowest if the connection is already established, but for efficiency the answer is not cut and dry but is rather a tradeoff decision based on the expected patterns.
mikepurvis · 5 years ago
One nice benefit of long polling is the built in catch-up-after-a-break functionality: When the client initiates the poll, it tells the server the state it knows about (timestamp, sequence number, hash, whatever), and the server either replies right away if it's different, or waits and replies once it's different.

With webhooks, as in the article, you only get state changes; you need some separate mechanism to achieve (or recover) the initial state.

hakunin · 5 years ago
Someone has to maintain an always-running listener for `/events`. If a server does that, and triggers client calls, we call that webhooks. If a client does that, and triggers internal functions, it's what the op describes. I think that for APIs, `/events` should indeed be the fundamental feature, and "webhooks" should be a nice-to-have service on top of `/events`, for those who don't want to maintain a local subscriber.
sk5t · 5 years ago
If the webhook events are coming at some sort of a brisk pace, the sender well may be able to reuse an already-open connection. And if they're rather infrequent, is the efficiency or latency likely to be a significant concern?

Deleted Comment

IshKebab · 5 years ago
If you're using HTTP use websockets or server-sent events, not long polling. Long polling is obsolete.
mywittyname · 5 years ago
> To mitigate both of these issues, many developers end up buffering webhooks onto a message bus system like Kafka, which feels like a cumbersome compromise.

Kafka solves exactly the issue that the author is complaining about. This is a safeguard to ensure that data isn't dropped in the event of an issue, and provides mechanisms to replay events.

The tradeoff between pushing and polling have been argued since forever.

In other news, mechanics who work with bolts often do so with ratchets. This is a cumbersome compromise, just give me Torx fasteners!

jerf · 5 years ago
It would if the source was pushing into the Kafka stream directly. It doesn't solve the problem of going out of sync if my code to push to the Kafka stream is entirely down and I miss POSTs.

(And, of course, I don't want Kafka. I want Google PubSub. No, wait, I mean SQS. No, wait, I mean I want zeroMQ. No, I mean....)

nine_k · 5 years ago
The question is: who maintains the queue of events, and pays for it?

Certainly the event producer is in a better position to maintain a queue without missing events, but it also means they need to buffer more data in their queue system to accommodate for your receiver's downtime

l_t · 5 years ago
Not disagreeing with your point, and I'm sure you already know this, I just wanted to point out (for the benefit of people that don't have other options) that it is possible to build "webhooks" in such a way that you're confident nothing is dropped and nothing goes (permanently) out of sync. (At least, AFAIK -- correct me if this sounds wrong!)

Conceptually, the important thing is each stage waits to "ACK" the message until it's durably persisted. And when the message is sent to the next stage, the previous stage _waits for an ACK_ before assuming the handoff was successful.

In the case that your application code is down, the other party should detect that ("Oh, my webhook request returned a 502") and handle it appropriately -- e.g. by pausing their webhook queue and retrying the message until it succeeds, or putting it on a dead-letter queue, etc. Your app will be "out of sync" until it comes back online and the retries succeed, but it will eventually end up "in sync."

Of course, the issue with this approach is most webhook providers... don't do that (IME). It seems like webhooks are often viewed as a "best-effort" thing, where they send the HTTP request and if it doesn't work, then whatever. I'd be inclined to agree that kind of "throw it over the fence" webhook is not great and risks permanent desync. But there are situations where an async messaging flow is the right decision and believe it or not, it can work! :)

mywittyname · 5 years ago
As long as you guarantee delivery to your message queue before acknowledging receipt, you should be golden.

Also, swapping out one messaging system for another is trivial. Pick the one best suited to the environment you're working in, and if that environment changes, changing messaging queues is going to be one the easiest transitions you'll make.

toomuchtodo · 5 years ago
You meant Apache Pulsar! :)
danudey · 5 years ago
Having helped manage a Kafka cluster, I do not want to run a Kafka cluster just so that Microsoft Teams can webhook me events now and then.
Floegipoky · 5 years ago
Yeah I was scratching my head reading this article; they're bending so far backwards to avoid the obvious solution that I thought they were gearing up to pitch some competing tech.

> If the sender's queue starts to experience back-pressure, webhook events will be delayed, and it may be very difficult for you to know that this slippage is occurring

I've never before seen anyone try to argue that properly dealing with backpressure is a bad thing. The author's proposed model makes this situation even worse. With kafka, consumers can continue processing the event stream and you can continue to serve reads from your primary datastore. With the author's model the event stream lives in your primary datastore, so if that starts to lock up the blast radius is much larger.

closeparen · 5 years ago
Are you going to expose your Kafka brokers directly to your integration partners? Are they going to use the Kafka client library and wire protocol to send you data? That’s the thing about webhooks, HTTP is universal and if you’re comfortable exposing anything externally, it’s going to be a web service.
mywittyname · 5 years ago
I would not expose kafka directly. I would implement this as:

HTTP Endpoint -> Push to message queue (kafka, SQS, etc) -> Acknowledge receipt

That's a pretty straight-forward design that's widely used, robust, and easy to put together. I've probably done that same workflow 100s of times without issue.

As long as you guarantee the message was pushed to the queue before acknowledging, that will be fabulously reliable. You need to make contingencies for duplicate messages, but that's not usually difficult.

mrkurt · 5 years ago
We expose the NATs protocol to our users. Exposing non-http protocols is fun, sometimes.
aunty_helen · 5 years ago
It's a common writing style as of late, set down a premise and solve that premise decisively.

Now, if that premise isn't based in reality, or if it's already been solved some other way, discredit it without giving it too much air time.

A one liner about kafka being cumbersome and then building your own solution, warts and all, doesn't need to exist in the same thought if you've made the reader mentally disregard it as a possible solution.

alexbouchard · 5 years ago
Totally, things can get very reliable if you start processing webhooks asynchronously. Personally I've found it pretty cumbersome and complicated to build the necessary infrastructure in the past. I've been building https://hookdeck.com as a simpler alternative specifically to ingesting incoming webhooks.
mbrevda1 · 5 years ago
Are events and webhooks mutually exclusive? How about a combination of both: events for consuming at leisure, webhooks for notification of new events. This allows instant notification of new events but allows for the benefits outlined in the article.
sb8244 · 5 years ago
What about supporting fast lookup of the event endpoint, so it can be queried more frequently?

I think that a combo of webhooks / events is nice, but "what scope do we cut?" is an important question. Unfortunately, it feels like the events part is cut, when I'd argue that events is significantly more important.

Webhooks are flashier from a PM perspective because they are perceived as more real-time, but polling is just as good in practice.

Polling is also completely in your control, you will get an event within X seconds of it going live. That isn't true for webhooks, where a vendor may have delays on their outbound pipeline.

jacobr1 · 5 years ago
The article advocate for long-polling
coldacid · 5 years ago
I think that's what the author was getting at, after reading through the whole article. The idea isn't to get rid of webhooks, but provide an endpoint that can be used when webhooks won't necessarily work.
snarkypixel · 5 years ago
Very similar to how I built my previous application.

1) /events for the source of truth (I.e. cursor-based logs) 2) websockets for "nice to have" real-time updates as a way to hint the clients to refetch what's new

saurik · 5 years ago
Yeah... I'd go so far as to argue that this is the only architecture that should even ever be considered, as only having one half of the solution is clearly wrong.
alexbouchard · 5 years ago
This is the way to go and I'd love to see more API's with robust events endpoint for polling & reconciliation. Deletes are especially hard to reconcile with many APIs since they aren't queryable and you need to instance check if every ID still exist. Shopify I'm looking at you.
shvedsky · 5 years ago
Yes to the combination of both. I worked on architecture and was responsible for large-scale systems at Google. Reliable giant-scale systems do both event subscription and polling, often at the same time, with idempotency guarantees.
j_san · 5 years ago
Sorry if I'm daft, could you/someone explain why one would want to use both at the same time for the same system?

One thing that makes sense: if you go down use polling so you can work at your own pace. But this isn't really at the same time. When/why does it make sense to do both simultaneously?

kissgyorgy · 5 years ago
What's the point of implementing webhooks once you implemented long polling for the /events endpoint?
mbrevda1 · 5 years ago
I'd argue against long/persistent polling. Webhooks allows for zero resource usage until a message needs to be delivered.
luuio · 5 years ago
I don't think the original comment meant long polling (i.e. keeping the connection alive), they meant periodically call the endpoint to check for events.
toomim · 5 years ago
There's a much better approach than /events or webhooks: add synchronization directly into HTTP itself.

The underlying problem is that HTTP is a state transfer protocol, not a state synchronization protocol. HTTP knows how to transfer state between client and server once, but doesn't know how to update the client when the state changes.

When you add a /events resource, or a webhooks system, you're trying to bolt state synchronization onto a state transfer protocol, and you get a network-layer mismatch. You end up with the equivalent of HTTP request/response objects inside of existing HTTP request/responses, like you see in /events! You end up sending "DELETE" messages within a GET to an /events resource. This breaks REST.

A much better approach is to just fix HTTP, and teach it how to synchronize! We're doing that in the Braid project (https://braid.org) and I encourage anyone in this space to consider this approach. It ends up being much simpler to implement, more general, and more powerful.

Here's a talk that explains the relationship between synchronization and HTTP in more detail: https://youtu.be/L3eYmVKTmWM?t=235

wruza · 5 years ago
You may send POST /events instead. It also breaks “REST”, which is just a sort of obsession rather than a requirement here, but more importantly it wouldn’t break idempotence and proxy caching that GET implies.

Edit: from the network point of view, it’s either call-back or a persistent call-wait/socket, or polling. The exact protocol is irrelevant, because it’s networking limits and efficiency that prevent everyone from having a persistent connection to everyone. A persistent connection can’t be much better than any other persistent connection in that regard, and what happens inside is unrelated story. Or am I missing something?

jonny_eh · 5 years ago
> just fix HTTP

Oh yes, changing HTTP is so easy.

toomim · 5 years ago
HTTP is actually quite malleable, and adding synchronization is easy.

You can add it to your own website with a few simple headers, and a response that stays open (like SSE) to send multiple updates when state changes: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-b...

...and you can get these features for free using off-the-shelf polyfill libraries. If you're in Javascript, try braidify: https://www.npmjs.com/package/braidify

derptron · 5 years ago
The website seems to be crammed into the left side of my screen unnecessarily.
top_kekeroni_m8 · 5 years ago
Centering a div is hard!