We gave an LLM a structural graph of a codebase before exploring. It used 54% MORE context than without one. Paper + explanation inside [R]
We published a small technical paper this week documenting something that surprised us:
In a controlled A/B benchmark on a production multi-repository TypeScript workspace (25 sections, 3,250 files), the arm equipped with a section-scoped structural graph (Blueprint — built from Universal Ctags + ast-grep + BM25) used 63,541 provider-billed input tokens. The arm without it used 41,327.
Same model (Kimi K2.6), same provider (OpenRouter), same task, same prescribed tool order.
The model with the graph explored more thoroughly — 6 tool-call turns vs 5, more internal function names surfaced, deeper coverage. Without the map, it explored conservatively and stopped sooner.
Our interpretation: structural understanding cost and execution context are separable problems. The graph costs ~6,500 tokens and bounds structural overhead. Execution context is determined by exploration depth — which increases when the model has navigational confidence.
We also documented post-turn tool-result summarisation (95–98% compression on individual read_file results before history persistence) as a separate mechanism for the execution layer.
Honest limitations: single task type (read-only exploration), single run per arm, no statistical significance claimed.
Full paper (Zenodo): https://zenodo.org/records/20381860 DOI: 10.5281/zenodo.20381860
Happy to discuss methodology, the separability framing, or the counterintuitive result.
[link] [comments]
Want to read more?
Check out the full article on the original site