Debezium supports outbox[1] pattern out of the box. All your application needs to do is to write to the outbox table and debezium will do the rest mentioned in the post and publish your events to kafka.
I’m working on a db migration project from MySQL to kv database and Debezium saved the project. Don’t know how we would have done it without it. Really cool tech.
Interesting approach, I use a very similar methodology in Marmot (https://github.com/maxpert/marmot) to do CDC for SQLite via triggers, and then these changes are published over NATS, replicated over to other nodes. So row-level replication powered by similar outbox except it uses NATS and JetStreams's quorum property to ensure reliable recovery.
If you just use notify to tell listeners “there is work in the queue to grab” can you still lose messages? I’ve always seen it implemented where workers have to go and claim items (transactionally), not just wait to be pushed them.
What happens if the message processing fails, e.g. your message bus broker is down -- is there a way of signalling to Postgres "hey, try me again later"?
Always found the first method of keeping bookmarks to the positions in a table to be the best method. I don’t think the 3 problems given to be an issue personally.
- just loop at the interval that your users can accept the delay
- querying a database 10 times a second is nothing, even 100 times, this wouldn’t cause resource limitations
- the scaling concern made no sense to me, seems a bit arbitrary.
Cool to know this is an option though, but I much prefer not relying on database internals like wal logs, this hurts scaling more IMO.
The only thing you need to worry about with the table method is out of order increment IDs, which is always possible in a transactional database. But there are many solutions for this.
I really like this solution and agree "bookmarks" are a lot better than outbox, but it is hard to guard against races where the reader increments the bookmark before an active transaction commits an ID at an earlier position; if transaction commits in a different order than IDs are allocated.
Perhaps this is what you mean with "out of order increment"... but what are the "many" solutions to this?
I struggle for a long time to find a good way of doing this in Microsoft SQL and still not perfectly happy about the solution we found.
Do you have a few pointers to the solutions regarding out of order IDs? I'm thinking of keeping track of gaps (yet unseen IDs) in another column and retrieving them in the next poll.
Not OP, but there is an approach here of using a dedicated loop worker to assign post-commit ID sequence. I.e. using the outbox pattern once, simply to assign a post-commit ID.
I wish DBs had this more built in, it seems a critical feature of a DB these days and the commit log already have very similar sequence numbers internally...
1) Well assuming data is stored in the SQL as the transactional store, how do you move data safely from SQL to the eventstore? You at least need the postbox pattern. It is not clear to me that the postbox pattern is less hacky than listening to a SQL change feed.
E.g., with Azure Cosmos DB -- you would not use the postbox pattern there. You listen to the DB change feed -- since that is provided and is easily accessible.
2) In our case the schemas in SQL are mostly event based already (we prefer insert over update ...even if we do business logic/flexible querying on the data). So using an event store is mainly a duplication of the data.
An event store is a database too. What exactly is it about a database that makes it a "proper event store"?
I honestly think the focus on duplicating data you have in your DB in a separate event store DB too may be something of a fad that will pass in a while -- similar to NoSQL. It's needed for volumes larger than what a SQL DB can handle, but if you don't need such large volumes why introduce the extra component. Event sourcing architecture is great; but such thinking on an architecture level is really orthogonal to the storage and transport chosen.
I wrote a blog post on this. We were using event sourcing and went with locking+batching method described. For other tables we wanted to read and move to a data warehouse we used the txid check.
A bit hacky, but on AWS RDS we didn’t want to deal with wal intricacies.
How would this work from a fault tolerance perspective? For example the listening application happens to be offline but the database is inserting records. How would the application catch up?
If the listener is offline, the Postgres will see that a subscriber is behind and mark more of internal data as needing to be retained - this will keep certain maintenance tasks from being run (i.e. VACUUM). If VACUUM is not run for long enough, it will cause a catastrophic failure in your DB.
The application can catch up when restarted, if it retains the last WAL position. When it restarts, can asks Postgres to start replaying from that point.
Well of course, but I don't feel like this is a noteworthy limitation here; it applies to any form of queue that persists messages.
I had imagined the GP might have been insinuating something like: a configurable number of WAL files would be written, then they'll be overwritten once all full.
What will I lose if instead of this or debezium I simply have a column representing the status of the message as “pending” or “complete”, and a job that periodically processes pending messages and marks them complete upon success, and another job that archives complete ones?
The author says the outbox pattern should "always" be used.
I agree with the goal -- one should definitely never publish an event externally before it is committed to DB! But I think using a "post-commit sequence number" is even more powerful than the outbox pattern.
Sadly few DBs seems to supoort this well with low latency. CosmosDB has great support for a post commit low latency change feed
What advantage would this offer over a durable, transactional queue, like, say Pulsar? I don't see much of an advantage, and depending on how much you're pushing through your queue, I could see the outbox table causing issues with your production system if you're using that DB to hold the data.
I've been using this pattern with Elixir (which is well suited for this type of workload) in production. I put together a little library (which is mostly ripped off from Supabase's realtime but with some helpers):
1: https://debezium.io/documentation/reference/stable/transform...