1 min readfrom Machine Learning

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

[R]

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison

I’m looking for feedback on a local agent-memory benchmark comparison, especially from people who care about evaluation methodology.

I built an open-source R&D memory system called Context Swarm Memory (CSM). It uses bounded read-only memory shards, query routing, probe/recall/synthesis, cited packets, and explicit Committer-gated writes.

The current comparison is against the accepted local Hindsight artifact on BEAM 100K:

  • CSM: 0.757573 AMB score, 342 / 400 correct
  • Hindsight: 0.733658 AMB score, 326 / 400 correct
  • CSM uses 38.2% fewer answer-visible context tokens
  • CSM is slower: 29.23s average retrieval vs 6.38s

I want to be precise about the claim:

This is not an official leaderboard claim. It is not a BEAM 10M claim. It is a committed local accepted-artifact comparison at 100K, and the next step should be independent replication or official chart acceptance.

Repo:
https://github.com/muhamadjawdatsalemalakoum/context-swarm-memory

Evidence and reproducibility notes:
https://muhamadjawdatsalemalakoum.github.io/context-swarm-memory/

The main question: what would make this comparison scientifically stronger before it is presented as a serious agent-memory result?

submitted by /u/keonakoum
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#BEAM
#memory benchmark
#CSM
#Hindsight
#agent-memory
#evaluation methodology
#Context Swarm Memory
#local artifact comparison
#bounded read-only memory shards
#query routing
#probe/recall/synthesis
#cited packets
#Committer-gated writes
#AMB score
#answer-visible context tokens
#average retrieval