Data¶
PufferDrive consumes map binaries generated from multiple data sources, including the Waymo Open Motion Dataset (WOMD) JSON files, ScenarioMax, and CARLA. This page covers how to obtain data and convert it into the binary format expected by the simulator.
Download options¶
pufferdrive_womd_train: 10k scenarios from the Waymo Open Motion training dataset.pufferdrive_womd_val: 10k scenarios from the Waymo Open Motion validation dataset.- Additional compatible sources: ScenarioMax exports JSON in the same format.
- Included CARLA maps: Readily available CARLA maps live in
data_utils/carla/carla_py123d.
Download via Hugging Face¶
Install the CLI once:
uv pip install -U "huggingface_hub[cli]"
Download:
huggingface-cli download daphne-cornelisse/pufferdrive_womd_train \
--repo-type dataset \
--local-dir data/processed/training
Place raw JSON files under data/processed/training (default location read by the conversion script).
Convert JSON to map binaries¶
The conversion script writes compact .bin maps to resources/drive/binaries:
python pufferlib/ocean/drive/drive.py
Notes:
- The script iterates every JSON file in
data/processed/trainingand emitsmap_XXX.binfiles. resources/drive/binaries/map_000.binships with the repo for quick smoke tests; generate additional bins for training/eval.- If you want to point at a different dataset location or limit the number of maps, adjust
process_all_mapsinpufferlib/ocean/drive/drive.pybefore running.
Map binary format reference¶
The simulator reads the compact binary layout produced by save_map_binary in pufferlib/ocean/drive/drive.py and parsed by load_map_binary in pufferlib/ocean/drive/drive.h:
- Header:
sdc_track_index(int),num_tracks_to_predict(int) followed by that manytrack_indexints,num_objects(int),num_roads(int). - Objects (vehicles/pedestrians/cyclists): For each object, the writer stores
scenario_id(unique_map_idpassed toload_map),type(1vehicle,2pedestrian,3cyclist),id,array_size(TRAJECTORY_LENGTH = 91), positionsx/y/z[91], velocitiesvx/vy/vz[91],heading[91],valid[91], and scalarswidth/length/height,goalPosition (x, y, z),mark_as_expert(int). Missing trajectory entries are zero-padded by the converter. - Road elements: Each road entry stores
scenario_id, a remappedtype(4lane,5road line,6road edge,7stop sign,8crosswalk,9speed bump,10driveway),id,array_size(#points), thenx/y/zarrays of that length and scalarswidth/length/height,goalPosition,mark_as_expert.save_map_binaryalso simplifies long polylines (len(geometry) > 10andtype <= 16) with a 0.1 area threshold to keep files small. - Control hints:
tracks_to_predictandmark_as_expertinfluence which agents are controllable (control_modein the simulator) versus replayed as experts or static actors (set_active_agentsindrive.h).
Refer to Simulator for how the binaries are consumed during resets, observation construction, and reward logging.
Verifying data availability¶
- After conversion,
ls resources/drive/binaries | headshould show numbered.binfiles. - If you see
Required directory resources/drive/binaries/map_000.bin not foundduring training, rerun the conversion or check paths. - With binaries in place, run
puffer train puffer_drivefrom Getting Started as a smoke test that the build, data, and bindings are wired together. - To inspect the binary output, convert a single JSON file with
load_map(<json>, <id>, <output_path>)insidedrive.py.
Interactive scenario editor¶
See Interactive scenario editor for a browser-based workflow to inspect, edit, and export Waymo/ScenarioMax JSON into the .bin format consumed by the simulator.
Generate CARLA agent trajectories¶
The agent trajectories in the provided CARLA maps are procedurally generated assuming a general velocity range without a valid initial state(no collision/offroad). The repository uses an external submodule for CARLA XODR processing (pyxodr).
To generate your own CARLA agent trajectories, install the submodules and developer requirements (editable install) before running the generator:
git submodule update --init --recursive
python -m pip install -e . -r requirements-dev.txt
Run the generator script. Important optional args:
--num_objects: how many agents to initialize in a map (default: map-dependent)--num_data_per_map: number of data files to generate per map--avg_speed: controls the gap between subsequent points in the trajectory
python data_utils/carla/generate_carla_agents.py --num_objects 32 --num_data_per_map 8 --avg_speed 2
There is also a visualizer for inspecting initial agent positions on the map:
python data_utils/carla/plot.py
Notes:
- Base Carla maps that agents are spawned live under
data_utils/carla/carla_py123dand the Carla XODRs are atdata/CarlaXODRsto interact with thepyxodrsubmodule for XODR parsing and agent traj generation. - If you encounter missing binary or map errors, ensure the submodule was initialized and the required packages from
requirements-dev.txtare installed.