1 min readfrom Machine Learning

Evaluating long-term memory limits in stateless LLM chatbots — feedback needed [D]

Hi all,

I’m working on a research project exploring how stateless LLM-based chatbots handle long conversations and whether important earlier information is still reliably retained over time.

My idea is to:

  • Run a chatbot using an LLM API without any external memory system
  • Introduce key facts early in a long conversation
  • Continue with many unrelated messages (hundreds of turns)
  • Later test whether the model can still correctly recall those facts at different intervals

I’m planning to measure recall accuracy and how it changes as the conversation grows.

Before I go deeper, I’d really appreciate feedback on:

  • Is this a valid way to evaluate long-context memory limits?
  • Are there better benchmarks or methods already used for this?
  • What metrics would make this more rigorous and convincing?

Any suggestions or criticism are welcome. I’m trying to make the evaluation as solid as possible before building it out.

Thanks!

submitted by /u/QuietAccountant4237
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#cloud-based spreadsheet applications
#Excel alternatives for data analysis
#real-time data collaboration
#financial modeling with spreadsheets
#real-time collaboration
#spreadsheet API integration
#LLM
#Long-term memory
#Chatbot
#Evaluation
#Stateless
#Long context
#Recall accuracy
#Memory limits
#Conversation
#Fact recall
#Metrics