Dead Letter Queue
A separate queue where messages that can't be processed are moved aside for inspection and manual handling.
A dead letter queue (also known as a poison queue) is where messages go when they can't be processed successfully. Instead of retrying forever or silently dropping the message, you move it aside so it doesn't block other messages.
The typical setup combines retries with a dead letter queue. When a message handler fails, the system retries it a few times. If it still fails after all retries, it's acknowledged on the original topic and published to the dead letter queue. This prevents a single broken message from blocking the entire subscription.
In Watermill, the PoisonQueue middleware handles this. You can also customize which errors trigger the move: temporary errors (like network timeouts) should be retried, while permanent errors (like invalid data) should go straight to the dead letter queue.
Moving messages is the easy part, but managing them after is more complex. You need to:
- Inspect the failed messages to understand what went wrong.
- Fix the underlying issue (a bug in the handler, missing data, schema mismatch).
- Requeue the messages back to the original topic for reprocessing, or delete them if they're no longer relevant.
One trade-off to keep in mind: using a dead letter queue affects message ordering. A message moved to the queue and later requeued will be processed out of its original order. If strict ordering matters, you may need a different strategy.
Setting up alerts for new messages on the dead letter queue topic is a good practice. You don't want failed messages sitting there unnoticed.
References
- Watermill 1.4 Released (Event-Driven Go Library) — Introduces the universal requeuer component with a SQL-based poison queue for failed messages, and a CLI tool for inspecting, deleting, and moving messages back to the original topic.
- Event-Driven Architecture: The Hard Parts — Discusses dead letter queues and alerting as part of building resilient event-driven systems. Covers moving failed messages to a separate queue for inspection and the tooling needed to review and requeue them.
- Watermill: from a hobby project to 8k stars on GitHub — Mentions Poison Queue as one of the key features built on top of Watermill's high-level API, alongside CQRS and the transactional outbox pattern.
- Watermill