But you have a fairly major problem with all these solutions and that is when a ...

grzm · on Feb 14, 2017

IIRC (it's been a while), PgQ doesn't keep transactions open while events are processed: events are fetched and marked as in process; they're then marked succeeded or failed to finish the event. The bulk copy of initial data which bootstraps replication is a long duration process: I don't believe transactions are held open this entire time.

If I understand you correctly, this handles your work queue case.

einhverfr · on Feb 14, 2017

Not quite.

In my experience you have several critical issues:

1. What happens when a job silently fails?

2. What happens when a job takes a lot longer than expected to succeed?

If you solve the first with a timeout, the second leads to a job rerun. The best (only?) solution I have found is to have some awareness in the job queue of the fact that the job is currently being processed. In my previous work we used advisory locks for that.

grzm · on Feb 14, 2017

It wasn't clear to me how closely you've looked at PgQ. Have you looked into the design (other than the README), or used it and found these failings? I'm certainly not going to be able to answer your questions off the top of my head given the time passed since I last used it.

Given your critiques of everything else out there (from what I gather from the rest of your comments in this thread), it seems like your identified a possible business opportunity.

einhverfr · on Feb 14, 2017

It's been a little while but I actually read through the source code of it and Londiste. It's possible I missed something, but I didn't see anything that would automatically reset messages if a connection goes away between receiving the message and marking it as completed.