We gave an LLM a structural graph of a codebase before exploring. It used 54% MORE context than without one. Paper + explanation inside [R]

We published a small technical paper this week documenting something that surprised us:

In a controlled A/B benchmark on a production multi-repository TypeScript workspace (25 sections, 3,250 files), the arm equipped with a section-scoped structural graph (Blueprint — built from Universal Ctags + ast-grep + BM25) used 63,541 provider-billed input tokens. The arm without it used 41,327.

Same model (Kimi K2.6), same provider (OpenRouter), same task, same prescribed tool order.

The model with the graph explored more thoroughly — 6 tool-call turns vs 5, more internal function names surfaced, deeper coverage. Without the map, it explored conservatively and stopped sooner.

Our interpretation: structural understanding cost and execution context are separable problems. The graph costs ~6,500 tokens and bounds structural overhead. Execution context is determined by exploration depth — which increases when the model has navigational confidence.

We also documented post-turn tool-result summarisation (95–98% compression on individual read_file results before history persistence) as a separate mechanism for the execution layer.

Honest limitations: single task type (read-only exploration), single run per arm, no statistical significance claimed.

Full paper (Zenodo): https://zenodo.org/records/20381860 DOI: 10.5281/zenodo.20381860

Happy to discuss methodology, the separability framing, or the counterintuitive result.

submitted by /u/Altruistic_Night_327
[link] [comments]

We gave an LLM a structural graph of a codebase before exploring. It used 54% MORE context than without one. Paper + explanation inside [R]

Want to read more?

Tagged with