Scholialang: an open, vendor-neutral protocol for structured AI agent reasoning traces [R]
Our new startup (Doug Fir Labs) just open-sourced Scholialang, a protocol for turning an agent's reasoning into structured, inspectable, reusable records instead of leaving it buried in a chat transcript.
The problem: when an agent does multi-step work — reads files, runs tools, makes decisions — the actual reasoning ends up as freeform prose in a log. A later session (or a different model) can't reliably pull "the evidence that supported decision X" back out without re-parsing English, and there's no stable way to reference a prior conclusion.
Scholialang gives agents a small typed vocabulary — Goal, Observation, Evidence, Finding, Deciding, Action, Contradiction, Retract, Concluding, etc. — with stable content-hash IDs, explicit references between atoms, and validator rules. v0.6 adds a content-addressed DAG registry and "lazy preludes" so a later session can pull prior reasoning by hash instead of replaying the whole transcript. Same atom format whether it's emitted by Claude, Codex, or a local model.
Early results — all small pilots, not final benchmarks, pushback welcome:
- Cross-model replay: gave fresh sessions from three model families (Opus 4.8, Fable 5, GPT-5.5/Codex) a trace with the final decision stripped; they re-derived the original decision in 135/135 cases. Caveat: convergent task set and cold-start baselines were already high on two of three models, so read it as a portability signal, not "beats transcripts."
- Token cost: carrying a compact reasoning prelude instead of full history cut Session-5 input tokens ~30–41% with quality flat in the gated arms (a max-compression mode reaches ~50% but trades a little quality).
- Quality safety: in a 4-arm eval, adding context tooling alone actually lowered answer quality vs a bare baseline; adding the structured framing on top repaired it back to baseline parity. Small n, p≈0.07 — suggestive, not significant. We're explicitly not claiming structure makes models smarter.
Code is MIT/Apache, spec is CC-BY, packages are on PyPI, and there are MCP + LSP servers with host recipes for Claude Code / Codex / Ollama.
Would genuinely value critique from people building agent systems or local tooling — especially on the vocabulary, the canonical_id semantics, and whether this should interoperate with OpenTelemetry / existing trace formats instead of being its own thing.
Spec + code: https://scholialang.org · https://github.com/dougfirlabs
[link] [comments]
Want to read more?
Check out the full article on the original site