Back to AI Workflows

Raw CLI Logs: How to See What Agents Actually Do

If your agent only looks impressive when the terminal is heavily formatted, you do not have observability. You have theatre.

Last updated: 2026-04-20 · Tested against Python logging docs, Node.js Console, jq Manual 1.6, GNU stdbuf, GNU tee, and OpenTelemetry logs docs

Contents

Too many agent CLIs are stage props. They repaint the terminal, hide real work behind spinners, and save the truth for the summary. I found the polished view stopped being useful once I had to reconstruct a failed run from fragments instead of a clear event log.

In our own workflows, raw CLI logs got more useful the moment we stopped asking the terminal to be a dashboard. The stream's job is not to flatter the tool. It is to show, line by line, what the agent is doing.

What stream contract should an agent run follow?

Start with three terms in plain English. stdout is the normal output channel for a CLI. It is where the machine-readable event stream should go. stderr is the separate channel for diagnostics, warnings, and failures. JSONL means JSON Lines: one JSON object per line, so each event can be streamed, parsed, and replayed without waiting for a giant pretty-printed blob (JSON Lines).

That split is not just taste. Python's logging cookbook shows how to route different log levels to different streams, and Node's Console supports separate stdout and stderr targets plus colorMode: false when you want less formatting noise (Python Logging Cookbook, Node.js Console). OpenTelemetry's logs data model points in the same direction: structured, correlated events are easier to move, inspect, and join up later (OpenTelemetry Logging).

My rule is blunt:

  1. Put machine events on stdout. Start, step, tool call, retry, completion.
  2. Put diagnostics on stderr. Warnings, stack traces, rate-limit messages, human-readable complaints.
  3. Keep one event per line. No wrapped objects, no decorative banners, no progress hidden inside cursor movement.
  4. Treat pretty output as optional. If you want a polished summary, print it at the end or in a separate viewer.

Pretty terminals optimise for demos. Raw streams optimise for audit.

How do you emit raw CLI logs without terminal theatre?

If you control the agent CLI, make the default stream boring on purpose. Boring is what lets tee, jq, file captures, and replay work without heroic cleanup.

Node logger that keeps events on stdout and diagnostics on stderrjavascript
import { Console } from "node:console";

const log = new Console({
  stdout: process.stdout,
  stderr: process.stderr,
  colorMode: false,
  inspectOptions: { compact: true, depth: 4 },
});

function emitEvent(event, data = {}) {
  process.stdout.write(`${JSON.stringify({ ts: new Date().toISOString(), event, ...data })}\n`);
}

emitEvent("agent.step.started", { step: "plan" });
log.error("rate limit hit; backing off for 2s");
emitEvent("agent.step.completed", { step: "plan" });

If you are in Python, the same idea holds. The logging cookbook supports stream-specific handlers, which is the clean way to keep routine progress off the same channel as errors (Python Logging Cookbook).

Python handlers that split event output from diagnosticspython
import json
import logging
import sys
from datetime import datetime, timezone

event_logger = logging.getLogger("agent.events")
event_handler = logging.StreamHandler(sys.stdout)
event_logger.addHandler(event_handler)
event_logger.setLevel(logging.INFO)
event_logger.propagate = False

diag_logger = logging.getLogger("agent.diagnostics")
diag_handler = logging.StreamHandler(sys.stderr)
diag_logger.addHandler(diag_handler)
diag_logger.setLevel(logging.WARNING)
diag_logger.propagate = False

event_logger.info(json.dumps({
    "ts": datetime.now(timezone.utc).isoformat(),
    "event": "agent.step.started",
    "step": "plan",
}))
diag_logger.warning("rate limit hit; backing off for 2s")

That is the contract: one line per event. No multiline object dumps. No ANSI colour soup. No fake progress indicators pretending to be observability.

How do you capture raw CLI logs and still watch the run live?

Use one pane to capture the raw event stream and another to watch diagnostics. tee duplicates standard input to standard output and a file, while stdbuf helps when output gets stuck behind buffering instead of appearing line by line (GNU tee, GNU stdbuf).

  1. Run the agent with line-buffered output.
  2. Write stdout to a raw log file with tee.
  3. Send stderr to its own file.
  4. Inspect the raw stream with jq instead of asking the producer to pretty-print it.
capture a live agent run without reformatting the streambash
stdbuf -oL -eL my-agent run 2>diagnostics.log | tee run.jsonl | jq -c -M .

In that command, jq -c -M keeps the output compact and monochrome, which is exactly what you want when you are tailing a live log instead of reintroducing pretty-print noise (jq Manual 1.6).

If you want the split view, open a second pane:

watch diagnostics separately while stdout stays rawbash
tail -f diagnostics.log

This diagram is the operating model I would keep in my head. The whole point is to stop asking one decorated terminal stream to do three different jobs at once.

Diagram of the raw CLI logs stream contract showing the agent sending JSONL events to stdout, diagnostics to stderr, stdout copied to run.jsonl, and the live stream inspected with jq.

Diagram
8 linescompact
%% File: visuals/raw-cli-logs-stream-contract.mmd
flowchart LR
    A[Agent process] --> B[stdout JSONL events]
    A --> C[stderr diagnostics]
    B --> D[tee run.jsonl]
    D --> E[Live terminal view]
    D --> F[jq -c -M inspection]
    C --> G[diagnostics.log]
Rendered from Mermaid source with the native ZeroLabs diagram container.

If you want the broader operating argument for keeping automation smaller and easier to inspect, read You Don't Need an AI Agent. If you want the adjacent publishing pattern where machine-readable outputs matter, How Claude Published Directly to Labs via MCP is the right companion.

How do you verify that the agent is emitting one line per event?

This is the checkpoint most teams skip. They assume the stream is clean because the terminal looks busy. Busy is not the same as inspectable.

Run a short task and check three things:

  1. The raw log grows while the run is still active.
  2. Each newline is a complete event.
  3. Warnings and failures show up in diagnostics.log, not mixed into the event stream.
quick checks for line-oriented agent outputbash
wc -l run.jsonl
tail -n 5 run.jsonl
tail -n 20 diagnostics.log

If tail -n 5 run.jsonl shows five complete records and each line is self-contained, the contract is working. If you see wrapped objects, partial fragments, or long silent gaps followed by a burst, suspect buffering, TTY-sensitive formatting, or a logger serialising pretty output instead of line output. I usually check stdbuf, tee, and the terminal mode before I blame the model or orchestration layer (GNU stdbuf, GNU tee).

Disabling formatting does make the stream less friendly for casual spectators. That is fine. A run log is not a keynote. I found the polished human summary only works after the run, or in a separate viewer, not in the live feed.

What usually breaks when the run suddenly goes quiet?

If a run suddenly goes quiet, buffering and TTY-sensitive colour behaviour are the first things to check. I found stdbuf and colorMode: false are the fastest sanity checks because they expose whether the producer is the problem before you blame the model or orchestration layer (GNU stdbuf, Node.js Console).

The trap here is organisational as much as technical. Teams tolerate fancy terminal output because it feels more finished. Then the first serious debugging session turns into archaeology. The cleaner default is to ship the raw stream first and earn every extra layer of presentation later.

If your team is building a heavier review loop around agent execution, AI review agents in the content pipeline is a useful next layer. It pairs well with raw event streams because review gets cheaper when the evidence surface is small and honest.

Frequently asked questions

Should agent progress logs go to stdout or stderr?

Put progress events on stdout if they are part of the machine-readable run record. Put diagnostics on stderr if they explain warnings, retries, exceptions, or environmental trouble. The useful test is simple: if another tool should parse it, keep it on stdout; if a human needs to notice it as a problem signal, stderr is the better home.

Is pretty-printed output ever worth it for agent logs?

Yes, but not as the only record. Pretty output is fine for a final summary, a local debug mode, or a separate TUI. It stops being fine when the formatted view becomes the only record of progress. That is when you lose replay, parsing, and any real audit trail.

Why does the agent look quiet until a burst of output appears?

Usually because the stream is buffered or the program changed behaviour when its output stopped going directly to a terminal. Test that first. Run with line buffering, disable colour, and confirm the producer flushes after each event. If the burst disappears, the model was not the issue. The stream contract was.

Raw CLI logs are not glamorous. That is exactly why they are useful: they make the agent easier to inspect, easier to replay, and much harder to hide behind formatting.


Next step: run one bounded agent task with raw CLI logs enabled, keep the run.jsonl file, and compare it to whatever pretty terminal view you were trusting before. If you want the surrounding ZeroLabs context, continue with You Don't Need an AI Agent and How Claude Published Directly to Labs via MCP.

Share