1 min readfrom Machine Learning

Cleo: trying to fit full analyst behavior in a 2B model [P]

Hello all!

Half of all industrial "chatbots" are just text-to-SQL models in a trenchcoat (and the other half RAG!). I wanted to explore just how small you could make these models if you trained, evaluated, and ran inference in the exact same structured harness, leading to Cleo: a Qwen3.5-2B-Base finetune.

Currently, some features of cleo that are only possible/useful in a unified hardel are:

  • Training on the exact same gather, repair, and answer contract it uses at inference time
  • Searching over candidate queries with live execution evidence, not just model likelihood
  • Co-designing the model contract, SQL safety layer, dialect handling, timeouts, and clarification behavior as one system

Everything is completely open-source, including the harness, model, and datasets.

GitHub: https://github.com/Dreeseaw/cleo

Hugging Face model: https://huggingface.co/dreeseaw/cleo

PS: If you're also resource-constrained and trying to do RL like me, I would highly recommend experimenting with ECHO: https://arxiv.org/abs/2605.24517

submitted by /u/Dreeseaw
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#financial modeling with spreadsheets
#real-time data collaboration
#real-time collaboration
#Cleo
#Qwen3.5-2B-Base
#finetune
#text-to-SQL
#RAG
#chatbot
#inference
#training
#harness
#contract
#SQL safety layer
#dialect handling
#timeouts
#clarification behavior
#model likelihood
#candidate queries