Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PufferDrive simulator guide

A high-performance autonomous driving simulator in C with Python bindings.

Entry point: pufferlib/ocean/drive/drive.py wraps pufferlib/ocean/drive/drive.h

Configuration

Basic settings

ParameterDefaultDescription
num_maps-Map binaries to load
num_agents32Policy-controlled agents (max 64)
episode_length91Steps per episode
resample_frequency910Steps between map resampling

Tip

Set episode_length = 91 to match Waymo log length for single-goal tasks. Use longer episodes (e.g., 200+) with goal_behavior=1 for multi-goal driving.

Control modes

  • control_vehicles: Only vehicles
  • control_agents: All agent types (vehicles, cyclists, pedestrians)
  • control_wosac: WOSAC evaluation mode (controls all valid agents ignoring expert flag and start to goal distance)
  • control_sdc_only: Self-driving car only

Note

control_vehicles filters out agents marked as “expert” and those too close to their goal (<2m). For full WOMD evaluation, use control_wosac.

Important

Agent Dynamics: The simulator supports three types of agents:

  1. Policy-Controlled: Stepped by your model’s actions.
  2. Experts: Stepped using ground-truth log trajectories.
  3. Static: Remain frozen in place.

In the simulator, agents not selected for policy control will be treated as Static by default. To make them follow their Expert trajectories, you must set mark_as_expert=true for those agents in the jsons. This is critical for control_sdc_only to ensure the environment behaves realistically around the policy-controlled agents.

Init modes

  • create_all_valid (Default): Initializes every valid agent present in the map file. This includes policy-controlled agents, experts (if marked), and static agents.

  • create_only_controlled: Initializes only the agents that are directly controlled by the policy.

Note

In create_only_controlled mode, the environment will contain no static or expert agents. Only the policy-controlled agents will exist.

Goal behaviors

Three modes determine what happens when an agent reaches its goal:

Mode 0 (Respawn) - Default:

  • Agent teleports back to starting position
  • Other agents removed from environment (prevents post-respawn collisions)
  • Useful for maximizing environment interaction per episode

Mode 1 (Generate new) - Multi-goal:

  • Agent receives a new goal sampled from the road network
  • Can complete multiple goals per episode
  • Tests long-horizon driving competence

Mode 2 (Stop):

  • Agent stops in place after reaching goal
  • Episode continues until episode_length
  • Simplest setting for evaluation

Important

Goal behavior fundamentally changes what “success” means:

  • Mode 0/2 (single goal): Success = reaching the one goal without collision/off-road
  • Mode 1 (multi-goal): Success = completing ≥X% of sampled goals cleanly

Config files: pufferlib/config/ocean/drive.ini (loaded first), then pufferlib/config/default.ini

Episode flow

  1. Initialize: Load maps, select agents, set start positions
  2. Step loop (until episode_length):
    • Move expert replay agents (if they exist)
    • Apply policy actions to controlled agents
    • Update simulator
    • Check collisions
    • Assign rewards
    • Handle goal completion/respawns
    • Compute observations
  3. End: Log metrics, reset

Note

Maps are resampled every resample_frequency steps (~10 episodes with default settings) to increase map diversity.

Caution

No early termination - episodes always run to episode_length regardless of goal completion or collisions with the default settings.

Actions

Discrete actions

  • Classic: 91 options (7 accel × 13 steer)
    • Accel: [-4.0, -2.67, -1.33, 0.0, 1.33, 2.67, 4.0] m/s²
    • Steer: 13 values from -1.0 to 1.0
  • Jerk: 12 options (4 long × 3 lat)
    • Long jerk: [-15, -4, 0, 4] m/s³
    • Lat jerk: [-4, 0, 4] m/s³

Note

Discrete actions are decoded as: action_idx → (accel_idx, steer_idx) using division and modulo.

Continuous actions

  • 2D Box [-1, 1]
  • Classic: Scaled to ±4 m/s² accel, ±1 steer
  • Jerk: Asymmetric long (brake -15, accel +4), symmetric lat (±4)

Dynamics models

Classic (bicycle model):

  • Integrates accel/steer with dt=0.1s
  • Wheelbase = 60% of vehicle length
  • Standard kinematic bicycle model

Jerk (physics-based):

  • Integrates jerk → accel → velocity → pose
  • Steering limited to ±0.55 rad
  • Speed clipped to [0, 20] m/s
  • More realistic comfort and control constraints

Important

Jerk dynamics adds 3 extra observation features (steering angle, long accel, lat accel) compared to classic.

Observations

Size

  • Classic: 1848 floats = 7 (ego) + 217 (partners) + 1624 (roads)
  • Jerk: 1851 floats = 10 (ego) + 217 (partners) + 1624 (roads)

Where partners = MAX_AGENTS - 1 agents × 7 features, roads = 232 segments × 7 features

Important

All observations are in the ego vehicle’s reference frame (agent-centric) and are normalized. Positions rotate with the agent’s heading.

Ego features (ego frame)

Classic (7): goal_x, goal_y, speed, width, length, collision_flag, respawn_flag

Jerk adds (3): steering_angle, long_accel, lat_accel

Partner features (up to MAX_AGENTS - 1 agents, 7 each)

rel_x, rel_y, width, length, heading_cos, heading_sin, speed

  • Within 50m of ego
  • Active agents first, then static experts
  • Zero-padded if fewer agents

Tip

Partner heading is encoded as (cos, sin) of relative angle to avoid discontinuities at ±π.

Road features (up to 232 segments, 7 each)

mid_x, mid_y, length, width, dir_cos, dir_sin, type

  • Retrieved from 21×21 grid (5m cells, ~105m × 105m area)
  • Types: ROAD_LANE=0, ROAD_LINE=1, ROAD_EDGE=2
  • Pre-cached for efficiency

Note

Road observations use a spatial grid with 5m cells. The 21×21 vision range gives ~105m visibility in all directions.

Rewards & metrics

Per-step rewards

  • Vehicle collision: -1.0
  • Off-road: -1.0
  • Goal reached: +1.0 (or +0.25 after respawn in mode 0)
  • Jerk penalty (classic only): -0.0002 × Δv/dt

Tip

Goal completion requires both distance < goal_radius (default 2m) AND speed ≤ goal_speed.

Episode metrics

Core metrics

  • score - Aggregate success metric (threshold-based):

    • Single-goal setting (modes 0, 2): Binary 1.0 if goal reached cleanly
      • Mode 0 (respawn): No collision/off-road before first goal (post-respawn collisions ignored)
      • Mode 2 (stop): No collision/off-road throughout entire episode
    • Multi-goal setting (mode 1): Fractional based on completion rate with no collisions throughout episode:
      • 1 goal: ≥99% required
      • 2 goals: ≥50% required
      • 3-4 goals: ≥80% required
      • 5+ goals: ≥90% required
  • collision_rate - Fraction of agents with ≥1 vehicle collision this episode

  • offroad_rate - Fraction of agents with ≥1 off-road event this episode

  • completion_rate - Fraction of goals reached this episode

  • lane_alignment_rate - Fraction of time agents spent aligned with lane headings

In-depth metrics

  • avg_collisions_per_agent - Mean collision count per agent (captures repeated collisions)

  • avg_offroad_per_agent - Mean off-road count per agent (captures repeated off-road events)

Note

The “rate” metrics are binary flags (did it happen?), while “avg_per_agent” metrics count total occurrences. An agent can have collision_rate=1 but avg_collisions_per_agent=3 if they collided three times.

  • goals_reached_this_episode - Total goals completed across all agents

  • goals_sampled_this_episode - Total goals assigned (>1 in multi-goal mode)

Metrics interpretation by goal behavior

MetricRespawn (0)Multi-Goal (1)Stop (2)
scoreReached goal before any collision/off-road?Reached X% of goals with no collisions?Reached goal with no collisions?
completion_rateReached the goal?Fraction of sampled goals reachedReached the goal?
goals_reachedAlways ≤1Can be >1Always ≤1
collision_rateAny collision before first goal?Any collision in episode?Any collision in episode?

Warning

Respawn mode (0) scoring: Score only considers collisions/off-road events that occurred before reaching the first goal. Post-respawn collisions do not disqualify the agent from receiving a score of 1.0.

Warning

Respawn mode (0) side effect: After respawn, all other agents are removed from the environment. This means vehicle collisions become impossible post-respawn, but off-road collisions can still occur.

Source files

C core

  • drive.h: Main simulator (stepping, observations, collisions)
  • drive.c: Demo and testing
  • binding.c: Python interface
  • visualize.c: Raylib renderer
  • drivenet.h: C inference network

Python

  • drive.py: Gymnasium wrapper
  • torch.py: Neural network (ego/partner/road encoders → actor/critic)

Neural network

Three MLP encoders (ego, partners, roads) → concatenate → actor/critic heads

  • Partner and road outputs are max-pooled (permutation invariant)
  • Discrete actions: logits per dimension
  • Continuous actions: Gaussian (mean + std)
  • Optional LSTM wrapper for recurrence

Tip

The architecture is modular - you can easily swap out encoders or add new observation types without changing the policy head.

Constants reference

Warning

These constants are hardcoded in the C implementation. Changing them requires recompiling.

Limits

  • MAX_AGENTS = 32 (compile-time, can be overridden with -DMAX_AGENTS=64)
  • MAX_ROAD_OBSERVATIONS = 232
  • TRAJECTORY_LENGTH = 91
  • MIN_DISTANCE_TO_GOAL = 2.0 m (agents closer than this won’t be controlled)

Spatial

  • GRID_CELL_SIZE = 5.0 m
  • VISION_RANGE = 21 cells (~105m × 105m)
  • Partner observation range: 50m

Physics

  • DEFAULT_DT = 0.1 s
  • Jerk long clip: [-15, 4] m/s³
  • Jerk lat clip: [-4, 4] m/s³
  • Steering limit: [-0.55, 0.55] rad (~31.5°)
  • Speed clip (jerk): [0, 20] m/s

Normalization

  • MAX_SPEED = 100 m/s
  • MAX_VEH_LEN = 30 m
  • MAX_VEH_WIDTH = 15 m
  • MAX_ROAD_SEGMENT_LENGTH = 100 m

Note

Normalization scales are chosen to map reasonable driving scenarios to ~[-1, 1] range for neural network stability.


Version: PufferDrive v2.0