Traces

A trace is a complete record of one agent run: every LLM call, tool invocation, retrieval step, reranking, and guardrail check, captured as a tree of spans with their inputs, outputs, timing, and metadata.

When an agent produces a wrong output, misses a tool call, or returns something unexpected, the trace shows you exactly what the model was given, what it decided, and what each step returned.

The Traces page

The Traces page shows every run for a workflow, newest first. Each row is one trace. Click any row to open it. You can customize which columns appear (span name, cost, latency, comments, detections, votes) so the list surfaces the signal that matters most for what you're debugging.

How traces are created

Pass instrumentations keys to neatlogs.init() and library calls are traced automatically, no other changes needed:

neatlogs.init(
    api_key=os.environ["NEATLOGS_API_KEY"],
    workflow_name="customer-support",
    instrumentations=["langchain", "chromadb"],
)

For your own code (agents, pipelines, tool functions), use @neatlogs.span to create spans explicitly:

@neatlogs.span(kind="WORKFLOW")
def handle_request(user_input: str) -> str:
    ...

Without a WORKFLOW span at the top of your call stack, library spans are captured but appear as top-level siblings with no parent. Wrapping your entry point with @span(kind="WORKFLOW") gives every trace a clean root.

Span tree

Opening a trace shows the full span hierarchy: every operation nested under the call that triggered it.

The nesting reflects your actual call hierarchy at runtime: WORKFLOW at the root, AGENT spans inside it, LLM and TOOL spans inside those. The tree tells you immediately what triggered what.

Each row shows configurable columns: span name, cost, latency, comments, detections, votes. Toggle them with the column picker. The tree panel can be collapsed to give the timeline more room, or unpinned entirely to view the timeline full-width.

Timeline

The timeline maps every span onto a time axis, aligned to the start of the trace. Spans that ran sequentially appear end-to-end; spans that ran in parallel overlap. Gaps are real idle time: the process waiting on a network call, a lock, or an async scheduler.

Each row shows configurable columns alongside the bar: span type, span name, cost, comments, detections, votes.

Four questions this view answers that the tree view can't:

Which step dominated total wall-clock time? The widest bar.
Is my async code actually running in parallel? Overlapping bars mean yes. Non-overlapping means your awaits are sequential.
Is latency concentrated or spread? One long bar vs. many short ones changes the fix.
Where is idle time? Gaps between bars are time the process spent doing nothing.

Span detail

Click any span in the tree or timeline to open its full data panel.

What's shown depends on the span kind:

LLM spans show the full prompt (system message, user message, conversation history) exactly as sent to the model, the completion response, token breakdown (prompt, completion, total, cache hits), model name and provider, and latency from request to first token and total. If you used PromptTemplate, the template and variable values appear here too.

Tool spans show input.value (the function arguments) and output.value (the return value), plus neatlogs.tool.name, neatlogs.tool.description, and neatlogs.tool.parameters if set.

Retriever spans show neatlogs.retrieval.query, neatlogs.retrieval.top_k, and neatlogs.retrieval.documents (the retrieved documents rendered as a list).

All spans include any custom neatlogs.* attributes you set via span.set_attribute(), plus start time, end time, duration, span ID, and trace ID.

The prompt shown in LLM spans is the exact payload sent to the provider API, after template compilation, after message formatting, after any pre-processing in your code. What you see is what the model saw.

AI assistant

Each trace has a built-in AI assistant with full read access to the run: every span, input, output, token count, and attribute.

Ask it questions about the specific trace you're looking at:

"Why didn't the agent call the search tool?"
"What was the model given as context in the second LLM call?"
"Which step had the highest latency?"
"What did the retriever return for this query?"

The answers are grounded in the actual span data of the run you're viewing, not a generic product description. Useful for engineers debugging a specific failure and for non-engineers who need direct answers without knowing where to click.

Voting

Thumbs up / down on any span marks its output as correct or incorrect. Votes are stored against the span (not the trace as a whole), visible to everyone on your team, and filterable across the workflow.

The practical use case: build evaluation datasets as you debug. Browse production traces, vote on spans whose outputs matter, then filter by vote to export a labelled set. No separate labelling tool needed.

Votes also surface regressions. If a new prompt version produces more thumbs-down on a particular span kind, it shows up immediately in the filtered view.