2 min readfrom Machine Learning

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

I Trained an AI to Beat Final Fight… Here’s What Happened [p]
I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Hey everyone,

I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community.

The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation.

A couple of interesting challenges came up:

  • Action space remapping (MultiBinary → emulator input)
  • Trajectory alignment issues (obs/action offset bugs 😅)
  • LSTM policy behaving differently under evaluation vs manual rollout
  • Managing rollouts efficiently without loading everything into memory

The agent can already make some progress, but still struggles with consistency and survival.

I’d love to hear thoughts on:

  • Improving BC performance with limited trajectories
  • Best practices for transitioning BC → PPO
  • Handling partial observability in these environments

Here’s the code if you want to see the full process and results:
notebooks-rl/final_fight at main · paulo101977/notebooks-rl

Any feedback is very welcome!

submitted by /u/AgeOfEmpires4AOE4
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#rows.com
#big data performance
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#no-code spreadsheet solutions
#Behavior Cloning
#Final Fight
#agent
#demonstrations
#reward shaping
#GAIL
#PPO
#action space remapping
#MultiBinary
#emulator input
#trajectory alignment
#LSTM policy
#evaluation