Hey everyone, my name is Tom, and I'm excited to be launching Diahook on HN!
Diahook makes it easy for developers to send webhooks. Developers make one API call and we take care of deliverability, retries, and offer a great developer experience for their users. Essentially, we make it possible for everyone to offer a Stripe-like webhooks experience.
At my previous company, our users were constantly asking us for webhooks, both for consuming in their own services and for integrations with no-code solutions like Zapier. However, we kept on deferring building them because we weren't willing to commit the engineering time, resources and ongoing maintenance required of a webhook delivery system.
There are a variety of challenges when it comes to sending webhooks. For example customer endpoints fail or hang much more often than you would think, so you need to implement retries, but also make sure that such failures don't slow down or block your send queue or the rest of your system. Additionally, because of how webhooks work, anyone can send fake webhooks to your customers, so you need to make sure to cryptographically sign the payload, and make it easy for your users to verify it. You also want to avoid overloading your users' endpoints, so you want to automatically rate-limit webhook sending, as well as disabling failing ones, and notifying your users when you do.
I love webhooks, and I think everyone should be offering them! Our goal with Diahook is to make it faster for developers to add webhooks to their service, take care of the above challenges (and more), relieve them of having to worry about maintenance and scaling, and offer their users a UI for inspecting, debugging and replaying of past webhooks out of the box.
I'd love to hear about your experience building (or using) webhooks systems. What's a must have? Any war stories to share? Got any questions? Suggestions? Please let me know!
I don't think they do, our whole job as an API company is to have uptime (and redundancy for uptime). I think it's the same with every other API company, such as Sendgrid, Twilio and etc.
User endpoints on the other hand, fail all the time, and often require a few retries.
With that being said, we plan on having client side (in our libraries) redundancy (try another endpoint if one fails), and ways for you to gracefully handle errors locally for users who need this.
It's not just your infra though. Say I'm a client sending you webhooks. My internet connection (if not in cloud infra) could be down, or my cloud provider might not have a route to your delivery endpoint due to a routing or similar outage. If those requests block, the client's app is going to hang or back up. If they don't block, they have to go somewhere (perhaps journaled with a task that can be executed on demand or on a recurring schedule).
My suggestion would be not only a library, but a local queue that can be managed with the library. Also, having some experience with working at a no code provider previously, ensure that it's trivial to query your service for what webhooks you can confirm were received by your infra and then delivered, perhaps by UUID.
Last, make sure it's dreadfully difficult to make changes to your systems where data loss could occur (messages dropped while returning 200 to counterparty systems, for example).
It's a good point, and actually some of our users already have this local queue implementation anyway because they want to send webhooks asynchronously. So it's definitely something to keep in mind. We already make it easy to query the service for what webhooks were delivered, but we also let you use the API with idempotency keys to ensure messages are not accepted twice.
I agree about all the issues you outlined though. One thing we plan on doing in terms of internet being down and cloud provider not having routes: have a lot of endpoints spread geographically in every region, and maybe in the same/nearby data centers.
> I don't think they do, our whole job as an API company is to have uptime (and redundancy for uptime). I think it's the same with every other API company, such as Sendgrid, Twilio and etc.
That doesn't sound realistic to me. Sendgrid, Twilio, and all other APIs have downtimes and other problems, just check their status page: https://status.sendgrid.com/history.
I always had to implement some retry logic in API clients at some point or another. That doesn't mean your service doesn't bring value but it's not enough to just expect it to work because it is focused on uptime. When implementing a service that requires external dependencies I would always expect them to fail and try to design with that fact in mind.
It's not always your service crashing you need to worry about; reliability also includes the network (and all of the routers) between you and your customers.
Yeah, I replied to a sibling comment. You are right, I was a bit sloppy with my message. I was just trying to make the point that it was similar to other APIs, and that we take uptime seriously.
Congratulations on launching this much-needed product! I've rolled my own webhooks several times, so I personally know all the pain points you mentioned.
I work in healthcare tech, and in the last couple years I've worked with legacy medical systems that use FTP to implement "pseudo-webhooks". For each integration with a third-party technology provider, there is a dedicated FTP server that reads/writes files to an FTP server owned by the third party. When this service is interrupted there is no retry logic - someone has to spend time manually checking if files made it across.
If someone could make this legacy process easier on developers and HIPAA compliant they would be rewarded handsomely, as these organizations (hospitals, medical device makers, health-tech companies, etc) are comfortable paying enterprise-level prices for any technology they use.
Very interesting. What’s the best way to learn more about this problem? Are you working for a healthcare tech company integrating with these legacy systems via FTP, or are you working for a provider or insurance company’s IT organization trying to integrate with said legacy systems for in-house software?
Happy to elaborate more for you. I work on the health tech side, so we’re integrating with legacy systems. I’d say there isn’t any standard for the actual mechanics of how the data is interchanged, but usually organizations adopt the HL7 standard for payloads. Most of our integrations look like “we’ll send you this HL7 payload, then we’ll periodically check this folder for any HL7 payloads to stick into the patient record”. If you want an analogy, think of any vehicle that human lives are dependent on, held together by duct tape. :)
I will say there are EMRs like Athena Health which actually build for the developer experience. Integrating with Athena is relatively easy, and you can scale up the integration across customers easily. Epic (the market leader) is much harder and more expensive to do this for, since every distribution usually has some customization.
We are going to start working towards HIPAA soon, and we also going to offer a way to end-to-end encrypt the payload in the very near future. The encryption doesn't help with HIPAA (does it?), but it definitely helps with peace of mind and reducing the attack surface.
This is great =). I have owned a large webhook delivery system myself, and considered starting a SaaS around it. I'm also a big fan of webhooks and I think there is definitely a market for this out there. Kudos to you for taking the plunge and launching!
Here's the only post I wrote for it that focuses on security, which is pretty critical for a webhook system https://www.easywebhooks.com/how-to-secure-a-webhooks-api. You might want to consider adding protection against Replay Attacks and support for Challenge-Response Checks if you do not already!
We already protect against reply attacks (we have a signed timestamp as part of the webhook), and I'm not sure what you mean by challenge-response checks. Do you mean things to protect against SSRF? If so, yeah, we also sign the webhooks in order to make sure they come from the right source.
A challenge response check will help ensure that the webhook consumer is actually using the webhook signature, and improve the likelihood that you are sending data to the right target. I saw a number of times that systems weren't verifying the data we were sending them came from us even though we had gone to all of this effort to help them :facepalm:.
Basically you periodically send a GET request to the target API with a token, and have them respond with the token encrypted with the same secret they'd use to decrpyt your webhook signature.
You could also consider sending dummy ping messages that may or may not have a valid signature (of course make sure this behavior is documented) that you would expect the target API to return a 4xx error if the signature is incorrect.
These extra steps are definitely not table stakes for a webhooks system, but could be enough to make sure the webhook event providers are being the best possible stewards of their user's data that they can =D. A lot of this complexity can also be wrapped by a client library you provide, which is a big win for everyone on its own.
Ah I get what you mean now! Interesting way to ensure they are actually doing the right thing! I think the ping with the bad signature to check they are actually verifying the webhook is much lighter weight, and almost solves the same problem. The one with the signed response is also interesting, but a bit harder to get people to actually implement.
I just skimmed through the post (it's late here, haha). It looks like Centrifuge is cool and is a part of what a webhook systems needs, but not all of it. Check out the landing page for some of the other things we provide.
Thanks for sharing this though, a cool read for tomorrow!
A go-to architecture topic of mine for backend engineer interviews is to design a webhook system (the push side). I’ve gotten a lot of mileage out of it because depending on the person’s experience dealing with a running production system, I’d get varying level of detail unprompted and that usually is a nice proxy for depth of experience.
It turns out my interview question is now a SaaS!
Kudos on the launch OP. This is a much needed service and it’s one of those things that once you see exist you go d’oh, why didn’t we have something like this all this while.
Square had a similar interview question but it was more along the lines of how to keep a socket open to a financial services company. Lots of the same issues.
We connected with Tom just as we were getting ready to build webhooks into our product (Kitemaker W21). We had a few teams asking for webhooks but didn’t have much time to dedicate to it right now so we were in the market for something like Diahook.
It ended up taking us one afternoon (this afternoon in fact!) to integrate Diahook. Was super straightforward and Tom was highly responsive.
This looks nice but don't I still have to invest all the engineering to make sure my calls reach Diahook service in the first place? The argument from diahook seems to be "we'll never go down". That's covered by SLAs by all companies that Diahook would be forwarding my calls to. Does Diahook claim 100% uptime? That's impossible. Even if it were possible, there are many other points of failure between my service and Diahook and I need to take all of them into account. Once I do that, I don't see much point in using Diahook. This would've been better off as a library I think.
There are other features/reason that would make me use a "webhook service" and I've contemplated building one before but mainly for async message passing between systems. Anything sync and you almost always need to take care of all the failure states.
This could be useful as a library or side-car. I can't help but think of this as a plugin for something like Envoy even though Envoy probably does a lot of this already.
Hey, thanks for the feedback! We don't claim 100% SLA, as you said, it's impossible. I already replied to this a few times in this thread.
I think the same can be said on every other API company, including Stripe, Sendgrid, Twilio and etc. If this is a concern for you, you need to take care of that, if it's not a concern for you, then Diahook is not any different.
Anyhow, we take of other things other than just deliverability. We sign the webhooks for you to prevent SSRF, we implemented a retry logic, we have monitoring on all of these, and all of the other things I mentioned elsewhere.
I mentioned it elsewhere in this thread, I don't see how this can be a library. It's a standalone service that needs to be run and monitored, needs to scale with your usage, and you need to make sure it doesn't hang, the queue doesn't get too long, and etc. I actually say it in the post, webhooks aren't as simple as they seem.
> We sign the webhooks for you to prevent SSRF, we implemented a retry logic, we have monitoring on all of these, and all of the other things I mentioned elsewhere.
Nice. These are very useful feature and I can see myself subscribing for these. May I suggest you highlight these as the main features and demote, remove or re-word "not having to invest significant engineering resources in making a reliable webhook calling system" bit (not exact words). Just a suggestion, don't claim to know more about your business than you but as a possible customer, this claim being at the top would throw me off TBH.
> I mentioned it elsewhere in this thread, I don't see how this can be a library. It's a standalone service that needs to be run and monitored, needs to scale with your usage, and you need to make sure it doesn't hang, the queue doesn't get too long, and etc. I actually say it in the post, webhooks aren't as simple as they seem.
They definitely aren't and I know that from building some quite non-trivial ones and I did build some of those as libraries used across projects in the same company. Also, wouldn't my scale be the same irrespective of whether I call Diahook or anything else directly? I still need to make same amount of calls, have same queue, same retries etc.
> I think the same can be said on every other API company, including Stripe, Sendgrid, Twilio and etc. If this is a concern for you, you need to take care of that, if it's not a concern for you, then Diahook is not any different.
This should definitely be a concern for everyone unless missing outgoing webhooks is fine for a service and I agree Diahook wouldn't be any different which was my original comment. Sorry if I came across as dismissive. I was just pointing out how using something like this cannot be a reason to eschew fault tolerance in outgoing webhook code.
Thanks for all of the feedback, definitely a lot of gems I plan to act on!
> Nice. These are very useful feature and I can see myself subscribing for these. May I suggest you highlight these as the main features and demote, remove or re-word "not having to invest significant engineering resources in making a reliable webhook calling system" bit (not exact words). Just a suggestion, don't claim to know more about your business than you but as a possible customer, this claim being at the top would throw me off TBH.
The landing page needs improvements. I agree that this can come across a bit wrong.
> They definitely aren't and I know that from building some quite non-trivial ones and I did build some of those as libraries used across projects in the same company. Also, wouldn't my scale be the same irrespective of whether I call Diahook or anything else directly? I still need to make same amount of calls, have same queue, same retries etc.
The webhook system is another system you need to scale (including monitoring), it's not the same as your main system. You need to make sure that your queue and workers can handle the load, monitor backlog, and etc. I don't think it's quite the same.
> This should definitely be a concern for everyone unless missing outgoing webhooks is fine for a service and I agree Diahook wouldn't be any different which was my original comment. Sorry if I came across as dismissive. I was just pointing out how using something like this cannot be a reason to eschew fault tolerance in outgoing webhook code.
I was definitely too sloppy in my original comment, I didn't try to eschew fault tolerance, I know how important it is! What I was trying to say is that for people who care about this high level of fault tolerance already have systems in place for the rest of the APIs they use, and the people who don't, don't. I don't think it's substantially different to other critical APIs in that sense.
It seems to me that people saying your service is also potentially unreliable are missing the point a bit. By paying you money, there's a customer relationship, an SLA, etc. As a (hypothetical) SAAS company, I don't get an SLA from my customers that their webhook consumer endpoints won't go down, but I do get an SLA from Diahook.
> As a (hypothetical) SAAS company, I don't get an SLA from my customers that their webhook consumer endpoints won't go down
Really? I thought this would be covered under most SLAs.
> It seems to me that people saying your service is also potentially unreliable are missing the point a bit.
The reason I (and _I guess_ other people) are thinking of this is that I read not having to invest significant engineering resources into building a robust failure resistant system as one of the main, if not the main selling point and I don't see how that is possible if I have to build it for Diahook in the first place. I understand Diahook has a great SLA and promises to be up as much as possible but things will always go wrong and there are so many points of failure outside of Diahooks control. So as a service maintainer, I still need to invest the same amount of engineering in the component that calls the webhooks.
SLA is a business thing, not a technical one. When I evaluate multiple services and compare their SLAs, I see the probability of one service causing less disruptions to my business than the other. I don't see that as a reason to not write fault tolerant code. Engineers cannot rely on a higher SLA as a reason to throw fault tolerance out of the window. That's my main issue with using a service like this. However, there can be other features that could still make me sign up. Not having to write fault tolerant code just isn't one and anyone buying into that is shooting themselves in the foot.
Spent so much time rolling my own webhook system for my eSign side project [0]- there are so many edge cases and thorny behaviors. This is a rare perfect fit of a SaaS offering. I can't wait to integrate with some of my other projects! The pricing model is generous as well.
Thanks a lot for your kind words! Please feel free to reach out at tom @ the domain. Happy to help you get started, and I'm super interested in hearing about your other projects!
Similar questions from last time there was a Webhooks as a Service on HN [1]
1. How do you deal with endpoints that are down or 500'ing? What kind of retry policy or backoff occurs? Related, how do you notify clients when their endpoints are having trouble? (I ask because that's PII that I then have to share with you).
2. Is there support for message signing (e.g., HMAC) to let clients verify that the webhook really came from us? How does it work?
3. Any kind of deduplication support? One use case we had was that certain webhooks only required the latest delivery. e.g., product inventory. If we previously failed to deliver a webhook, but have a newer version pending, our next attempt should try to send the newest version.
4. Is it possible to delay hydrating data in the payload? It seems the expected usage is that I send a blob of data and you take care of the last mile and send it. Is it possible to send you an id and then you call back to my service and fetch the latest version of that object just before delivery?
5. How are webhook subscriptions actually managed? e.g., my app lets users register webhook URLs? How do I get those URLs to your service?
6. The big question: What do I do if your service is down?
1. Exponential backoff. We don't notify clients, but we notify you with a webhook. We actually don't have it implemented yet, but will have in the next week or two.
2. Yes, and libraries for some languages (more coming) to easily verify it. See the docs for how it works, but standard stuff. We also plan on offering end-to-end encryption in the very near future so that we can't even access your payloads.
3. No, and no plans for supporting it for now. Does anyone do it? I think sending all of them is actually a feature.
4. Could you elaborate? You mean changing the webhook content in each attempt? No plan at the moment. Similar to the previous point. We consider webhook payloads as immutable messages.
5. You can either use our API to add them, or redirect users to a management UI we built with a one-time password. We will also offer JS libraries to make it easy to build your own UI.
The real value creation here would be turning a GET API endpoint into a webhook service. So your service would monitor the service and push updates on changes / diffs based on certain time interval.
This is just an illustrative example. I used NPM because it doesn't require any authorization. We also support proper subscriptions for NPM that listen to their couchdb change feed.
Fanout founder here! I appreciate the mention, although we don't actually support sending webhooks. Well, awhile back we did, but lately we only support pushing to client-initiated connections.
Good to see this being addressed! We used to offer webhooks-as-a-service at Fanout, but it wasn't very popular and last year we discontinued the feature.
There are several reasons I believe we weren't successful with it:
1) There is a perception that webhooks are "easy", and so developers might not even think to look for a solution. After all, anyone can send an HTTP request. The devil is in the details of course, and a good webhooks system is actually a lot of work.
2) I believe most devs looking for a "push" solution are usually looking for something like WebSockets (which has a more common perception of being hard). And so the people finding us or landing on our page had a specific goal in mind, and our additional offering of webhooks wasn't enticing.
3) Our webhooks feature simply wasn't that good. It handled retries, fan-out, rate limiting, ordered delivery, and full payload customization. That's a start, but a complete solution also needs inspection, response feedback, test calls, and UI widget.
With the right product and GTM, maybe it can work. Good luck!
I can't remember how many times I've written Redis-backed retry timers in Go to handle unreliable endpoints. It'd be great to not have to do that again. I like that you're focusing on such a concrete problem that I can see shaped functionality need into which it fits.
What happens when the requests to your API fail? Do I need to retry? Will there be an SDK that can help with this?
Essentially I think it's a risk with any external API you use. What happens when Stripe goes down? Twilio? Sendgrid (when you use magic links login)? Our whole focus is uptime, that's what we do. :)
This is one of the advantages of using background jobs. Any call to a 3rd party service should be in the background to handle network and reliability issues. The job system handles automatic retry.
There is something really cool about this. It is the kind of service I think I could implement if I invested enough time, but at the same time since you are taking care of all edge cases and just provide it as a service with such a simple interface, I'd rather just pay for it!
Thanks for the kind words! Yeah, there is no magic (well, there is some), it's just the product of experience and hard work. We make it so you don't have to earn this experience the hard way, and not have to waste your time on doing the grunt work. :)
Hey, This is an amazing product! Congratulations on the launch. At Hookdeck(hookdeck.io), we are solving a similar problem. Hookdeck sits between the API provider & your service, ensuring you never miss a webhook notification. We provide the queues for reliable webhook ingestion and the tooling for easy monitoring and troubleshooting, Automatic & manual retries, event filtering. You should check us out
This is very cool. I scratched down "webhooks-as-a-service" into my "business ideas" note years ago after building my own webhook system for keygen.sh (mine is backed by Redis+Sidekiq). Lots of edge cases over the years. Glad somebody is running with the idea!
This is cool! One problem I've run into with Webhooks in the past is not being able to audit every call after the fact. I want to be able to see which endpoints were called, when, with what payloads (logs scrubbed of PII etc.). This information is invaluable in responding to support queries.
We have the ability to audit and inspect past calls. We don't yet scrub PII, though we have a few features down the pipeline that will enable just that. Thanks for your comment!
PS. On a side note, it would be very nice to have a standard for webhooks similarly to OAuth. We're designing a solution for accepting various webhooks and the variety of negotiation schemes required by various cloud services is absolutely counter-productive.
At NewsCatcher [1] we've been asked for webhooks to our data a few times. We still did not start the implementation. Do you think Diahook would be a good solution to consider?
Also, 1$ per 1,000 hooks. What exactly does it mean? Like any push through your hook?
Yeah, you are probably right and the writing is sloppy. I was going to fix it, but I just saw we have an explanation under the pricing boxes. Does that clarify things?
This looks awesome - I'm in the middle of building https://onlineornot.com and was just about to put the webhook functionality on hold for a bit before seeing this
Hehe, I know this story too well, I've been there before! Feel free to reach out (email: tom @ the domain) if you have any questions or need any help! Oh, and please share feedback.
Extremely good idea. I've had to implement webhooks in the past and it was a huge pain (and my implementation likely had tons of bugs anyways). Turning into a well architected service seems like a perfect fit.
Our whole focus is uptime, and I think it's similar to how you use other APIs in your service, how do you deal when you can't reach them?
With that being said, this is something we plan on helping with too. We will have retries to different endpoints (in different geographies), and ways to easily implement it locally. We already offer idempotentcy so that helps.
At some point you're going to have to quantify that and provide enough evidence (of one sort or another) for this claim to be credible. After all, every service under the sun will claim things like 'scalability', 'performance', 'uptime', 'resiliency', whatever. Devil is in the details.
I don't want to nitpick but ... are you really 'focused' on this? You're a startup with limited resources trying to get a product up and running with commercially-viable feature-set. You're not setup for high availability. You just aren't. Proper HA is very hard and as you climb the '9s' in uptime, you introduce huge amount of complexity, cost, planning and manpower - which is something you haven't done and can't do as a startup and, honestly, it isn't worth to do for a Webhook SaaS.
And that's OK! The nice thing about Webhooks is that nobody runs mission-critical workflows with them because there is an implicit expectation of unreliability. This means that for HA/uptime you just need something reasonable. Your service may not survive an AWS outage (or whatever cloud service or datacenter you're deployed in) or a directed DDOS attack, but if your infrastructure can handle traffic spikes and occasional node going down - you're golden.
That's a fair assessment! We are indeed a startup, and definitely have limited resources. What I meant is that this is one of our top concerns that we constantly think about and deploy (our limited) resources on! We are definitely not where we want to be just yet!
Half of the magic is in the infra (scalability, monitoring, and etc), so we can't deliver the same value with just a library. Though I would love to hear what you have in mind!
P.S, what you are doing with Fly.io is super cool!
I've built a serverless webhook dispatcher at my previous job and you can actually cover most of the scalability and resiliency requirements with a few AWS services. I think what is hard to build and the core value that your service provides is the management, governance, and monitoring around it. I would focus there as the main selling point. Kudos for launching it and good luck.
Hey thanks! If you want to run Diahook on it we can help.
We happen to be pretty good at infra, so a library / self hosted setup is usually preferable. This is partially because we prefer to monitor things ourselves, and partially because we don't want to send things like auth tokens through a third party service. We might be unique, though.
Thanks, I'll keep it in mind when we are ready! I especially like the easy postgres replication. :)
I'll reach out about Diahook, as we can also have a self-hosted offering, and I think it could still make sense for you. As you can still benefit from the rest of the offering, including client side libraries and management UI (will be fully customisable soon).
Looks great! Sorry for the off-topic but what is the "2021 bootstrap" that all startup landing pages are using? I like the look and keep seeing it everywhere
GitHub sends webhooks to our Jenkins servers, but sometimes services on our side are down. Can we use your service as an intermediary to ensure delivery?
This is a great idea, congrats! We could definitely use this, although I’m cautious about passing our customer data through yet another data processor.
Your concern is fair, and I wish more companies were as responsible about their user data as you are!
I'm actually very privacy conscious myself, which is why we plan on offering optional end-to-end encryption in the very near future. This way only you and your customers can access the data, not us. :)
Our libraries will deal with the encryption/decryption for you.
We take care of related webhooks channels. We touched on some of them on the landing page, but, in no particular order:
Deliverability, monitoring, retries, security (signing), a nice UI for your users to debug, inspect and etc. :)
Nice work! I'm just glad to see more people in the space and there has been very little pressure on platforms to offer good webhooks, many integrations are pretty barebone and I hope you manage to help with that. We've also been building https://hookdeck.io which is tackling a related problem but for incoming webhooks instead of outgoing. Receiving webhooks at scale reliably requires a lot of work and well-thought-out infrastructure. I'm just about to do a Show HN of our own. I'll reach out to you!
Diahook makes it easy for developers to send webhooks. Developers make one API call and we take care of deliverability, retries, and offer a great developer experience for their users. Essentially, we make it possible for everyone to offer a Stripe-like webhooks experience.
At my previous company, our users were constantly asking us for webhooks, both for consuming in their own services and for integrations with no-code solutions like Zapier. However, we kept on deferring building them because we weren't willing to commit the engineering time, resources and ongoing maintenance required of a webhook delivery system.
There are a variety of challenges when it comes to sending webhooks. For example customer endpoints fail or hang much more often than you would think, so you need to implement retries, but also make sure that such failures don't slow down or block your send queue or the rest of your system. Additionally, because of how webhooks work, anyone can send fake webhooks to your customers, so you need to make sure to cryptographically sign the payload, and make it easy for your users to verify it. You also want to avoid overloading your users' endpoints, so you want to automatically rate-limit webhook sending, as well as disabling failing ones, and notifying your users when you do.
I love webhooks, and I think everyone should be offering them! Our goal with Diahook is to make it faster for developers to add webhooks to their service, take care of the above challenges (and more), relieve them of having to worry about maintenance and scaling, and offer their users a UI for inspecting, debugging and replaying of past webhooks out of the box.
I'd love to hear about your experience building (or using) webhooks systems. What's a must have? Any war stories to share? Got any questions? Suggestions? Please let me know!
Docs: https://www.diahook.com/docs/
API viewer (and OpenAPI specs): https://api.diahook.com/docs/