---
title: Work the dead-letter queue
description: Inspect and retry failed incident-webhook deliveries on the Operations Console DLQ page and with argusctl dlq ls and dlq retry, including the drainer-worker caveat.
---

# Work the dead-letter queue



## Goal

Find incident webhooks that failed delivery, fix the underlying cause, and re-arm them for redelivery.

## What lands in the DLQ

When ARGUS sends an incident webhook to the platform's webhook receiver and delivery fails, the event lands in the incident-webhook dead-letter queue instead of being lost. Each row carries:

* the error class
* the retry count
* the last error message
* a payload digest

Typical reasons to be here: a broken or rotated webhook secret, a receiver that was down, or a poison-pill payload that fails on every attempt.

## Prerequisites

* A PAT with read access to events for listing (`events:read`); replaying a row is a write (`events:emit`).

## Steps (console)

1. Open the DLQ page (route `#/dlq`) at ops.useargus.co. The table lists failed events with their error class, retry count, last error message, and payload digest.
2. Diagnose before retrying: a repeated error class across many rows usually means one broken receiver or secret; a single row failing repeatedly suggests a poison-pill payload.
3. Fix the underlying cause (for example rotate the broken webhook secret or repair the receiver).
4. Retry the row. The retry action re-arms it for redelivery (backed by the replay endpoint, `POST` on the DLQ row), toasts, and updates the row in place.

## Steps (CLI)

```bash
argusctl dlq ls --status pending     # pending dead-letter rows
argusctl dlq retry <id>              # re-arm one row for redelivery
```

`dlq retry` is a write: it supports `--dry-run` and is audited like every other write.

## The drainer caveat

Re-arming a row does **not** deliver it immediately. A retried row is re-delivered only when the drainer worker runs. If you retry rows and they stay pending, the drainer has not picked them up yet; the row state is correct (re-armed) even though the webhook has not gone out.

## Verify

* `argusctl dlq ls --status pending` after the drainer runs: successfully redelivered rows leave the pending list.
* The retry counts on remaining rows tell you whether redelivery was attempted again and failed again.

## Troubleshooting

* **Retried rows never deliver.** Check that the drainer worker is actually running; re-armed rows wait for it.
* **The same row keeps failing after retries.** Treat it as a poison-pill payload: inspect the last error message and payload digest rather than retrying again.
* **Many rows with the same error class.** Fix the shared cause (receiver or secret) first, then retry; retrying into a still-broken receiver just increments retry counts.
* **403 on retry (exit code 5).** Your PAT can read the queue but lacks the write scope for replay.


---

For a semantic overview of all documentation, see [/sitemap.md](/sitemap.md)

For an index of all available documentation, see [/llms.txt](/llms.txt)

For agent-facing discovery, including API and MCP surfaces, see [/agents.md](/agents.md)