AgentForge — How to Read This Dashboard
A 5-minute field guide for the CISO, the on-call engineer, and the demo-day audience.
What you're looking at (90-second version)
AgentForge is an adversarial AI security platform: a fleet of cooperating agents that continuously red-team a target LLM application, judge the results with a locked rubric, file vulnerability reports, and propose fixes — all without a human in the inner loop. The dashboard is the platform's ops center, the single screen a hospital CISO can walk up to and answer “is this thing working, and is it getting better?” It shows live agent state, attack throughput, coverage, cost burn, severity trend, and the platform's own bug inbox — six panes across one row plus a topology hero strip on top.
The Topology
What it is
A live, auto-drawn map of every agent in the platform and the messages flowing between them. Each node is a worker; each edge is a typed message channel (campaign, attack_attempt, verdict, vuln_report, patch_suggestion, observe). The view polls/v1/agents/topology every second and is rendered from the live jobs and heartbeats tables — there is no separate “diagram” file anywhere. If a node is missing, the agent is not running.
Why it's needed
A multi-agent platform without a topology view is opacity behind a REST API. A CISO is not going to read a job table. She needs to see, at a glance, whether the platform is running, where the traffic is concentrated, and where alerts are firing — and she needs to see it in the same visual idiom she sees DataDog APM or Honeycomb service maps in. The topology is also the only pane that makes the Watchdog visible as an architectural choice rather than just a database table; it's how we show that the platform watches itself.
What to expect
- A clean left-to-right flow: Orchestrator → Red Team → Target → Judge → Docs Agent → Patch Advisor.
- A curved feedback edge from Target back to Red Team, routed below the main rail — this is how partial successes drive the next mutation.
- The Watchdog sits on a parallel rail below the main flow, with dashed gray observation lines reaching up to every agent it watches. That spatial choice is the architecture: Watchdog observes the bus; it never participates in grading or attacking.
- Node colors: green pulsing = active and emitting heartbeats; amber = degraded (refusal spike, heartbeat lag, dry-well); red = alert (something needs human attention); black = offline.
- Edge animation pulses when traffic is flowing in the last 30 seconds; the small label on each edge shows the message kind and recent msg/sec.
How it meets the requirements
| Requirement | What this pane answers |
|---|---|
OBS-06 | “What is each agent doing, and in what order did it happen?” — live state per node, plus click-through to the trace flame in LangSmith. |
ARCHDOC-02 | “How agents communicate — what messages or signals pass between them.” The edges are the messages; each one is labeled by its kind from the bus schema. |
TS-01 | “Trust boundaries deliberately designed.” Watchdog as a parallel observer rail — visually separated from the grading and attacking lanes — is the architectural answer. |
The Other Five Panes
2 · Coverage Heatmap
A grid of the threat-model categories (rows) by sub-category (columns). Each cell is colored by 7-day pass rate: greener = more attacks resisted, redder = more successful exploits, gray = not yet tested. Refreshed every 5 seconds from coverage_cells. This is the CISO's money-shot — “where are we testing and where are we winning.”
3 · Live Attack Feed
A terminal-style stream of every attack the platform launches and the Judge's verdict, in real time, capped at the latest 200 rows. Streams from /v1/events via SSE, falls back to 1s polling if the SSE channel drops. During an active campaign this scrolls fast — that's the demo-friendly cinematic.
4 · Cost Burndown
A line chart of token spend per campaign over the last 6 hours, in 5-minute buckets. The dashed line is the kill-switch threshold; the Orchestrator's tick loop halts the campaign automatically when it's crossed (the dashboard just visualizes the line — it doesn't enforce). Reads /v1/cost/burndown.
5 · Severity Over Time
A stacked area chart of open vulnerabilities by severity over the last 7 days, in 24-hour buckets. Each layer is one severity class (p0 highest, p3 lowest). The slope of the stack tells you whether the target is getting more or less defensible. Refreshed every 60 seconds from /v1/findings/timeseries.
7 · Bug Inbox
Defects in the platform itself — filed by the Watchdog, the gap-review loop, and the drift-check guard. It is read-only: humans approve every fix, because TS-05 says the Patch Advisor never writes. From /v1/bugs?status=open, updated every 30 seconds.
State Colors
| State | Color | What it means |
|---|---|---|
idle | gray | Agent is alive and emitting heartbeats but has no active jobs in the last 30 seconds. |
active | green | Heartbeats current and at least one message produced in the last 30 seconds. |
degraded | amber | Heartbeat lag, refusal-rate spike, dry-well, or other soft-warning signal from the Watchdog. |
alert | red | Open watchdog alert on this agent — something needs human attention. |
offline | black | No heartbeat for > the staleness window. Last-seen timestamp on hover. |
When You Don't See Activity
The honest gotcha: this is a live demo of a live platform. If no agents are running, every node shows offline and the feed will be empty. If workers are running but no campaign has been fired, the agents will be idle / active with heartbeats but no message traffic on the edges. To wake everything up:
make -C infra demo-up && make -C infra demo-traffic
The first command brings the agent fleet online; the second fires a campaign through the bus. Give it ~10 seconds before expecting verdicts.
Where the Data Comes From
Every pane reads from the FastAPI backend on :8701, which in turn reads from a single Postgres instance on :5701. There is no second data store and no derived cache; the dashboard and the Orchestrator see the same source of truth (per OBS-07: the observability layer is the data substrate for the Orchestrator, not just a human dashboard).
| Pane | Endpoint | Postgres tables |
|---|---|---|
| Topology | /v1/agents/topology | jobs, heartbeats, alerts |
| Coverage Heatmap | /v1/coverage | coverage_cells |
| Live Attack Feed | /v1/events (SSE) | attack_attempts, verdicts, vuln_findings |
| Cost Burndown | /v1/cost/burndown | cost_ledger |
| Severity Over Time | /v1/findings/timeseries | vuln_findings |
| Bug Inbox | /v1/bugs?status=open | bug_tickets |
| Status Bar | /v1/agents/topology, /v1/cost/total, /v1/findings | cross-cutting |
For the deeper architecture story (agent inventory, trust boundaries, orchestration strategy) read ARCHITECTURE.md in the repo root. For the precise requirement IDs cited here, see docs/REQUIREMENTS.md §7.