Training
Basic training
Launch a training run with Weights & Biases logging:
puffer train puffer_drive --wandb --wandb-project "pufferdrive"
Environment configurations
Default configuration (Waymo maps)
The default settings in drive.ini are optimized for:
- Training in thousands of Waymo maps
- Short episodes (91 steps)
Carla maps configuration
For training agents to drive indefinitely in larger Carla maps, we recommend modifying drive.ini as follows:
[env]
goal_speed = 10.0 # Target speed in m/s at the goal. Lower values discourage excessive speeding
goal_behavior = 1 # 0: respawn, 1: generate_new_goals, 2: stop
goal_target_distance = 30.0 # Distance to new goal when using generate_new_goals
# Episode settings
episode_length = 300 # Increase for longer episode horizon
resample_frequency = 100000 # No resampling needed (there are only a few Carla maps)
termination_mode = 0 # 0: terminate at episode_length, 1: terminate after all agents reset
# Map settings
map_dir = "resources/drive/binaries"
num_maps = 2 # Number of Carla maps you're training in
this should give a good starting point. With these settings, you’ll need about 2-3 billion steps to get an agent that reaches most of it’s goals (> 95%) and has a combined collsion / off-road rate of 3 % per episode of 300 steps in town 1 and 2, which can be found here. Before launching your experiment, run drive.py with the folder to the Carla towns to process them to binaries, then ensure the map_dir above is pointed to these binaries.
Note
The default training hyperparameters work well for both configurations and typically don’t need adjustment.
Note
The checkpoint at
resources/drive/puffer_drive_weights_carla_town12.binis an agent trained on Carla town 01 and 02 with these settings. This is the one used in the interactive demo.
Controlled experiments
Aside from train and sweep, we support a third mode for running controlled experiments over lists of values:
puffer controlled_exp puffer_drive --wandb --wandb-project "pufferdrive2.0_carla" --tag speed
Define parameter sweeps in drive.ini:
[controlled_exp.env.goal_speed]
values = [10, 20, 30]
This will launch separate training runs for each value in the list, which cab be useful for:
- Hyperparameter tuning
- Architecture search
- Running multiple random seeds
- Ablation studies