I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]
RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the model is trained on hard data with 10% the learning rate of stage 1. RPS is basically a combination of existing ideas: curriculum learning + learning rate decay.
ARC-AGI 1 public eval scores:
base model: Qwen3-8b
RPS: 4%
EPS (equal learning rate in both stages): 2.4%
Program Synthesis Stats:
Program executions without error:
RPS: 1145/1200
EPS: 870/1200
https://iamjasonfeng.blogspot.com/2026/05/regressive-plasticity-schedule.html
[link] [comments]
Want to read more?
Check out the full article on the original site