2 min readfrom Machine Learning

NOML-NOML: hierarchical TD3 + anchor policy for flight control [P]

I built a custom RL algorithm for continuous flight control and open-sourced it. Sharing here in case the structural ideas are useful for anyone doing continuous control where one action axis dominates.

I've been training continuous control on a 6-DoF flight sim (pitch/roll/yaw/throttle/brake/fire) and kept hitting the same wall: vanilla TD3 would peak, then collapse into pitch oscillation and never recover. I tried reward shaping for a while before concluding the problem was structural, not in the reward. NOML is what came out of that.

Three structural changes on top of a standard TD3 skeleton:

  • Anchor policy — the action is anchor + delta·gate, where the anchor is a fixed safe action (wings level, MIL throttle). The policy literally cannot fully forget how to fly straight; the worst a collapsed policy can do is fall back to the anchor.
  • Hierarchical actor — three MLPs with independent optimizers (pitch → roll → rest), so a roll-side gradient update can't corrupt the pitch head. This is what actually killed the oscillation for me.
  • Mirror learning — left-right symmetry means every transition can be mirrored into a free second sample. 2× data when env steps are the bottleneck.

One thing that surprised me and goes against the usual advice: my best results came with exploration noise effectively off. On this task adding Gaussian action noise mostly just shook the stick and hurt. The anchor+gate structure seems to provide enough of the "fall back to safe behavior" role that noise usually plays.

Code (Apache 2.0), full writeup, and a test video are here: https://github.com/9138noms/NOML

https://www.youtube.com/watch?v=ZNn6wo_PX8Y

submitted by /u/9138NOMS
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#rows.com
#financial modeling with spreadsheets
#no-code spreadsheet solutions
#big data management in spreadsheets
#machine learning in spreadsheet applications
#enterprise-level spreadsheet solutions
#conversational data analysis
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#NOML
#TD3
#continuous flight control