Kafka persists events locally, which when mishandled can cause synchronization i...

Kafka persists events locally, which when mishandled can cause synchronization issues. If an event-based system has to cold-restart, it becomes difficult if not impossible to determine which events must be carried out again in order to restart processes that were in progress when the system went down.

This is a characteristic with all event-based systems, but persistence-enabled event systems (such as Kafka) make it even harder because now there are events already "in flight" that have to be taken into account. Event-based systems that do not have persistence (and thus are simply message queues used as a transport mechanism) have a strong guarantee that _no_ events will be in-flight on a cold-start, and thus you have an easier time figuring out the current overall state of the system in order to make such decisions.

The only other way around this is to make every possible consumer of the event-based system strongly idempotent, which (in most of the problem spaces I've worked in) is a pipe dream; a large portion certainly can be idempotent, but it's very hard to have a completely idempotent system. Keep in mind, anything with the element of time tends not to be idempotent, and since event systems inherently have the element of time made available to them (queueing), idempotency becomes even harder with event based systems.

A rule of thumb when I am designing systems is that a data point should only have one point of persistence ("persistence" here means having a lifetime that extends beyond the uptime of the system itself). Perhaps you have multiple databases, but those databases should not have redundant (overlapping) points of information. This is the same spirit of "source of truth", but that term tends to imply a single source of truth, which isn't inherently necessary (though in many cases, very much desirable).

Kafka, and message queues or caches like it (e.g. Redis with persistence turned on), breaks this guarantee - if the persistence isn't perfectly synchronized, then you have, essentially, two points of persistence for the same piece of information, which can (and does) cause synchronization issues, leading you into the famously treaterous territory of cache invalidation problems.

As with most technologies, you can reduce your usage of them to a point that they will work for you with reasonable guarantees - at which point, however, you're probably better off using a simpler technology altogether.