Reinforcement Learning

RobustWalker trains a PPO-based neural network policy to control the Unitree Go1 quadruped robot using only proprioceptive sensing (no cameras or LiDAR). The robot learns to walk robustly on rough terrain and recover from external disturbances.
Quadruped locomotion on uneven terrain without vision sensors requires the policy to generalize across varied physical conditions — friction, payload, motor strength, and unexpected pushes.
Implemented domain randomization during training to randomize friction, payload, motor strength, and external forces. Used a multi-objective reward function balancing forward velocity, energy efficiency, and stability. Trained with vectorized parallel environments via Stable-Baselines3.