•2 min read•from Machine Learning
Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]
Wanted to see how close a fully bio-plausible agent could get to PPO on Pong.
Setup
- Custom Pong environment (pygame, no gym)
- PPO baseline: paper-faithful, from scratch
- Hebbian agent: PPO policy replaced with Hebbian value estimation
- engineered features → 61%
- BioAgent: Predictive Coding for feature learning + distributional Hebbian plasticity for value (Dabney et al. 2020) → 57% Zero backprop anywhere in the pipeline.
Key observations
- The 2% gap is real but small. The bottleneck wasn't the lack of backprop because it was catastrophic forgetting under non-stationary opponent dynamics during self-play.
- Distributional value encoding (à la Dabney) helped stability vs. a scalar Hebbian baseline, but not enough to match PPO under self-play.
- Self-play exposed the plasticity–stability dilemma hard: Hebbian rules that adapt fast forget fast. This is the real wall for bio-plausible RL in non-stationary settings.
Not claiming novelty in the architecture as this is a from-scratch exploration of whether bio-plausible rules can handle a real RL task. Short answer: yes, mostly, with one clear failure mode.
Code: github.com/nilsleut/Biologically-Plausible-RL-Plays-Pong
Happy to answer questions about the PC implementation, the Hebbian value estimator, or the self-play setup.
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#self-service analytics tools
#self-service analytics
#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#real-time data collaboration
#real-time collaboration
#financial modeling with spreadsheets
#machine learning in spreadsheet applications
#row zero
#no-code spreadsheet solutions
#predictive analytics in spreadsheets
#predictive analytics
#Backprop-free Pong
#PPO
#plasticity–stability dilemma
#distributional Hebbian plasticity
#distributional value encoding
#bio-plausible agent