1 min readfrom Machine Learning

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

I recently wanted to see whether an AI agent could self-improve a harness to solve terminal bench tasks. It’s possible for an AI agent to propose a meaningful one-time change to the harness, but after experimenting with this for a couple of weeks, I think the continuous self-improvement is mostly an experiment-systems problem. The system needs a way to decide what kind of improvements can safely compound.

Turns out there's a lot of parallels to coding-agent customization (e.g. SKILLS.md etc..) too.

I wrote my experience of building such system here, including the successful and failure attempts during the process, and how I approached the self-improvement loop. It's not intended as a benchmark claim but more of a systems/research writeup.

https://www.henrypan.com/blog/2026-05-25-self-improvement-harness/

submitted by /u/Megadragon9
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#self-service analytics tools
#self-service analytics
#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#real-time data collaboration
#financial modeling with spreadsheets
#real-time collaboration
#AI agent
#self-improvement
#harness
#terminal bench tasks
#experiment-systems problem
#coding-agent customization
#improvements
#self-improvement loop
#continuous self-improvement
#successful attempts
#failure attempts