LiDARUnrealGPUSimulationDatasets

Jan 31, 2026•10 min

How we generate real-time synthetic LiDAR for simulation and production datasets

Real-time synthetic LiDAR aligned to RGB and labels (stencil IDs), with GPU-resident reconstruction from render passes for dataset-scale throughput and closed-loop simulation.

RGB frame with LiDAR overlay plus labels (stencil IDs)

FeaturedArticle

How we generate synthetic LiDAR data

SiRLab targets real-time simulation and dataset-scale LiDAR generation, where throughput and alignment dominate. We design for faster-than-real-time headroom to accommodate closed-loop tasks such as control, inference, and hardware-in-the-loop (HIL). RGB images, labels (stencil IDs), and LiDAR returns are derived from the same frame, so alignment is inherent rather than enforced post hoc.

LiDAR reconstruction is a data-parallel geometry workload: each beam is independent and the computation repeats across beams. GPU compute (compute shaders or CUDA kernels) maps one beam per thread, uses on-chip shared memory for scan lookups, and relies on texture units where UV and label (stencil ID) lookups use point sampling and depth uses bilinear sampling.

CPU collision raycasting (Unreal collision/physics raycasts) has a different bottleneck profile. Even with strong acceleration structures (BVH or similar), cost rises with triangle count at high frame rates. Hybrid CPU-GPU-CPU paths add transfer and synchronization overhead that often dominates at dataset scale.

For that reason, we keep the full LiDAR pipeline on the GPU and read back only finalized results. Depth capture, sampling, reconstruction, filtering, labeling (stencil IDs), and noise remain in GPU memory. Readback occurs only after compute completion, followed by a memcpy into shared memory for any client to consume. The objective is deterministic, realistic, render-truth LiDAR with stable throughput for production datasets and real-time simulation.

Design goals for dataset-scale LiDAR and real-time simulation

Throughput beyond real-time under a fixed timestep
Headroom for closed-loop simulation workloads (control, inference, HIL)
Alignment by construction (RGB, labels (stencil IDs), LiDAR from the same frame)
Deterministic scan patterns and noise seeded by sim step
No game thread stalls (staged readback, optional sync)
Clean payload contracts for downstream tooling and validation

One frame, three aligned sensors

RGB + labels (stencil IDs) → depth → LiDAR, all from the same frame

Rendered scene

RGB + labels (stencil IDs)

↓→

Depth buffer

Camera-space Z

↓→

LiDAR point cloud

+ labels (stencil IDs)

Dataset preview

Side-by-side alignment check

RGB and LiDAR overlays from the same render frame

Rendered scene

RGB + labels (stencil IDs)

LiDAR overlay

points projected on RGB

Dataset preview

Render-pass availability in modern 3D game engines

Modern 3D game engines expose render-pass outputs at frame rate. These outputs provide LiDAR inputs without extra scene traversal. In a customizable, physically grounded pipeline, those outputs are computed as part of normal rendering. Well-optimized Unreal scenes and performance-tuned games can sustain ~120 FPS; in that regime, simulated LiDAR can in principle run at the same cadence.

Render-pass outputs used for LiDAR

RGB plus depth and material buffers emitted every frame

RGB (final render)

Scene depth

Base color

Roughness

World normal

Dataset preview

Throughput defines real-time constraints

Simulation timing is not only about dataset export. Real-time headroom is required for closed-loop tasks such as control, inference, and hardware-in-the-loop (HIL) I/O. Accelerated modes are used when the goal is throughput. Throughput is measured as simulated seconds per wall-clock second, and the target is sustained T > 1 whenever the scenario allows it.

If a 3D engine sustains 120 Hz, the render-pass outputs already exist at that cadence. In that regime, the dominant costs are the LiDAR reconstruction compute shader and the GPU-to-CPU readback.

Those costs scale primarily with sensor configuration. Horizontal FOV determines whether capture uses one or three depth views (and their size). Horizontal and vertical channel counts set the point count, which sets compute thread count and readback length. Other overheads are approximately constant; integrating LiDAR as a render-pipeline stage adds a small fixed pass, whereas an asynchronous path must keep resources alive longer and often introduces extra copies.

Development and profiling use a mid-range desktop baseline (RTX 3060-class GPU with 8 GB, mainstream CPU, 16 GB RAM), and we treat 50-70 FPS as a conservative operating point. Higher-end workstations push the same pipeline well beyond that range. Throughput scales directly with available GPU headroom.

Sustained throughput above 1x requires:

Fixed timestep for determinism
Sensors that do not stall the game thread
Readback staged so the GPU is never forced to wait on the CPU
Output written asynchronously while the simulation advances

If any sensor blocks the game thread, throughput collapses.

In practice, the export client writes artifacts (including the public Hugging Face previews) through a background worker queue, so disk I/O can lag without throttling the simulation loop. On the same mid-range development desktop, this decoupling keeps the simulation loop stable.

Why CPU collision-raycast LiDAR does not scale

CPU collision-raycast LiDAR (Unreal collision/physics raycasts) is a reasonable prototype, but it is structurally mismatched to dataset-scale workloads.

1) Scaling characteristics

Cost grows with rays, frames, and scene complexity. The work runs on CPU threads already dedicated to simulation and physics. Even with strong acceleration structures (BVH or similar), this is cache-unfriendly triangle traversal repeated at high frequency.

2) Representation mismatch (collision vs rendered geometry)

Raycasts operate on collision proxies, not the render mesh. That creates a persistent divergence between what the camera sees and what LiDAR returns, which becomes systematic error at scale.

In production scenes, dense foliage can reach tens of thousands of instances. Adding per-blade or per-mesh collision is usually avoided because it multiplies authoring and physics cost without meaningful interaction value. When interaction is required, teams typically use coarse proxies (spheres/capsules) or disable collision entirely. As a result, collision geometry diverges from the rendered surface by design, not by mistake.

Collision can track the underlying geometry, but it cannot see alpha-masked cards or render-time deformation, so the rendered surface still diverges.

Render-truth surface vs collision proxy

Same asset; drag to compare the rendered surface against the physics proxy

Rendered surface vs proxy

Rendered surface diverges from geometry/proxy

3) GPU-only geometry effects (vertex displacement and tessellation)

Vertex displacement is a material-driven per-vertex deformation applied in the vertex shader at render time. Tessellation subdivides and refines geometry on the GPU after CPU collision has already been resolved. CPU collision raycasts cannot observe these GPU-only deformations, while a depth-based pipeline captures them by sampling the rendered surface.

Rail ballast is a representative case: fine surface relief is authored for rendering using displacement maps and tessellation, while collision is simplified to a coarse visibility proxy. The visual surface deforms at sub-triangle scale, so render-truth depth and LiDAR follow the displaced geometry while collision stays static.

Ballast case study: render-time displacement vs collision proxy

The first panel compares the displaced render surface to its collision proxy; the second shows LiDAR alignment on RGB from the same scene.

Render-time displacement: ballast render vs proxy

Depth/LiDAR follow the displaced render surface; collision stays simplified

Render surface vs proxy

Displaced render mesh compared to visibility collision

↓→

RGB + LiDAR Overlay

Perfect LiDAR alignment on the displaced render surface

LiDAR preview (Open viewer)

To show the sensor-level consequence (not just the proxy mismatch), the next sequence compares RGB, depth, and LiDAR on the same wind-deformed render surface.

Foliage case study: render-truth LiDAR under GPU-driven deformation

GPU-driven deformation (vertex displacement, tessellation, displacement maps) is applied at render time and is not represented by collision proxies. These panels isolate foliage where wind deformation displaces the render mesh. Depth and LiDAR are derived from the rendered surface, preserving alignment under deformation; collision proxies remain static and are not shown.

GPU-deformed foliage: render-truth sensor alignment

RGB, depth, and LiDAR remain co-registered on the deformed surface

Rendered RGB

Wind-driven vertex deformation

↓→

Depth (render pass)

Post-deformation surface

↓→

RGB + LiDAR overlay

Returns from render-truth

4) Render-only phenomena (thin geometry and particulate media)

Thin foliage and alpha-masked surfaces are often rendered as fine geometry while their collision is simplified or omitted. Collision rays intersect the proxy, not the rendered micro-structure, producing biased returns. The same mismatch applies to particulate media (rain, spray, smoke) that are rendered but not represented by collision primitives. Taken together, collision-based LiDAR under-represents rendered reality and introduces systematic bias in dataset labels (stencil IDs).

5) Material and texture data (surface properties)

The render pipeline exposes material textures that encode surface properties such as normals, roughness, and displacement. These signals are not available in the physics scene, yet they are critical if LiDAR intensity or scattering is modeled beyond pure geometry. A GPU path can condition the LiDAR response on these textures, whereas collision-only tracing cannot access them.

Render-pass inputs for shader-based LiDAR

Depth and material buffers from the same frame

Scene depth

Base color

Roughness

World normal

Dataset preview

6) CPU reconstruction from render buffers (hybrid path)

A CPU pipeline can reconstruct LiDAR from depth and material buffers (e.g., depth, normals, roughness). This avoids collision raycasts and improves render-truth fidelity. It is also attractive for rapid, low-complexity implementations because reconstruction can run on the CPU (for example in Python/NumPy or C++) without authoring parallel HLSL/compute or CUDA kernels. However, it still requires GPU rendering plus readback of large, uncompressed render textures each frame, often across multiple passes. At dataset scale, the GPU-to-CPU transfer and synchronization dominate, and the CPU becomes the limiting stage for LiDAR beam reconstruction and labeling. Pure math operations (such as adding noise) are comparatively cheap; labeling requires extra texture lookups. The scalable alternative is to keep reconstruction and filtering on the GPU and read back only the compact point payload. This approach can be useful for offline analysis or low-rate preview generation, but it does not preserve real-time throughput.

7) Transfer overhead in hybrid pipelines

Attempting to "speed it up with GPU" often makes the data path worse. Hybrid CPU-GPU-CPU pipelines add transfer and synchronization overhead that erodes throughput and increases jitter.

Observed failure modes

Throughput collapse as ray counts or scene detail increase
Render starvation when CPU work delays the render thread
Alignment drift from collision proxy divergence
Operational overhead from manual collision authoring and maintenance

CPU vs GPU resource usage (per frame)

Why CPU collision raycasts saturate the game/physics threads and stall the render loop

Collision LiDAR

(CPU collision raycasts)

CPU Utilization

LiDAR CPU Sim

Other Sim Work

GPU Utilization

Idle / unused (waiting for CPU to catch-up)

Other Sim Work

Shader-based LiDAR

(GPU)

CPU Utilization

CPU Sim

GPU Utilization

GPU Sim

LiDAR compute

Collision LiDAR vs shader-based LiDAR

What changes when LiDAR stays on the GPU

Criteria	Collision LiDAR(CPU collision raycasts)	Shader-based LiDAR(GPU)
Real-time at scale	No	Yes
Matches what the camera sees	Approximate	Yes
Multi-sensor alignment	Fragile	By design
Asset setup & maintenance	High	Low
Labels (stencil IDs)	Manual	Built-in
Best suited	Prototypes	Production datasets

Inside the shader-based LiDAR pipeline

This section describes the GPU-resident sensor pipeline as a fixed data contract between the engine and the client. The goal is deterministic, aligned LiDAR at dataset scale with bounded synchronization. The data path runs from render-pass outputs to a shared-memory payload, then into the client recorder on a fixed tick.

1) Render-pass capture and sensor-aligned inputs

Depth is captured from the same frame as RGB and labels (stencil IDs), using render-pass outputs that already exist at frame rate. If a single view cannot cover the horizontal field of view, capture is split into three views so the full span is covered. At 360°, this becomes three 120° views, still from the same frame as RGB and labels (stencil IDs).

The reconstruction stage consumes a packed float texture where forward depth is preserved at full precision and label (stencil ID) and intensity channels are co-packed for later decoding.

2) Scan lookup and GPU reconstruction

The scan lookup is generated once per sensor configuration from angle tables and intrinsics. Each beam maps to a single UV, so per-frame trig is avoided and the compute shader performs direct texture fetches. For each horizontal and vertical index, the ray is projected with Xp = tan(theta) * fx + cx and Yp = tan(phi) * fy / cos(theta), then normalized by render target size to store u and v. Out-of-range channels are zeroed and U and V are packed into a single float texture.

UV lookup texture

U in top half, V in bottom half

3) Payload contract and header semantics

Reconstruction runs one thread per beam, uses point sampling for UV and label (stencil ID) lookups, and uses bilinear sampling for depth when labels (stencil IDs) are stable. The payload is a float32 Nx4 array where XYZ are left-handed centimeters and W packs stencil in the upper 8 bits and intensity in the lower 16 bits. Here, the term label refers to the stencil ID from the render pass.

Each frame is preceded by a compact 128-byte header with:

Magic and version identifiers
Frame info including point count and payload sim step
Sensor pose in left-handed centimeters and quaternions in xyzw
Intrinsics with FOV, channel counts, range, and noise
A CRC over intrinsics for integrity checks

This contract is stable across clients and enables deterministic parsing and validation.

4) Shared-memory staging, tick sync, and client decoding

The engine publishes each frame into a shared memory pool with per-actor ring slots, while a tick sync block coordinates pacing and exposes the simulation step. On the client side, the tick manager waits for completion, then the memory pool reader fetches the newest slot with read_next_simple. Render-based sensors use a frame offset of 1, so payload_simstep can differ from the core tick and is used for stopping and alignment. Parsed frames are converted to right-handed meters and queued to the background recorder, which writes npz frames and preview point clouds.

for each sim tick:
  wait_tick_complete()
  for each lidar actor:
    raw = mem_pool.read_next_simple(actor_id)
    frame = parse_lidar_payload(raw)
    points = frame.points_sensor
    enqueue_recording(actor_id, frame, points, frame.payload_simstep)

Shader-based LiDAR pipeline

GPU capture, compute, staged readback

Depth, labels (stencil IDs), and auxiliary passes are captured from the same frame into render targets that match the sensor FOV.

Capture

RGB + labels (stencil IDs) + depth

↓→

A compute shader maps each beam to a UV lookup, reconstructs XYZ, and packs float4 payloads with label (stencil ID) and intensity.

Compute

UV lookup + reconstruction

↓→

Results are staged into a ring-buffered GPU readback so the game thread is not forced to wait.

Readback

staged, non-blocking

↓→

The latest slot is copied into shared memory with a 128-byte header carrying pose, intrinsics, and CRC for integrity.

Publish

shared memory payload

Why it stays fast

Throughput is governed by parallelism, memory locality, and synchronization. The pipeline is structured to maximize GPU occupancy while minimizing CPU-GPU coupling.

Parallel structure: each beam is independent, so the compute shader scales with channel count.
Texture locality: depth is already in texture memory; sampling uses GPU caches and hardware interpolation.
Precomputation: UVs are generated once per sensor configuration, removing per-frame trigonometry.
Decoupled readback: a ring buffer isolates GPU completion from the game thread; only finished slots are consumed.
Asynchronous I/O: recording runs off the main loop to preserve deterministic tick pacing.

These mechanisms keep the simulator in the regime where dataset export remains bounded by GPU throughput rather than CPU stalls.

Data contract and client pipeline

The LiDAR payload is defined as a stable binary contract for real-time transfer. The header provides a compact frame summary, sensor pose, LiDAR intrinsics, a layout identifier, and a CRC for integrity. The payload layout is float4 with XYZ in left-handed centimeters and W packing stencil and intensity. Noise is seeded by the sim step so scans are repeatable across runs.

Public previews include a README with schema notes for manifest.json and point clouds. Preview artifacts are generated by the same client pipeline that consumes the shared-memory payloads.

Typical outputs per LiDAR sensor:

lidar/<sensor_name>/
  intrinsics.json
  frame_schema.json
  frames_index.parquet
  frames/
    00000000.npz
    ...
preview/
  *.pclbin or *.pclz
manifest.json
README.md
labels/labels.json

Inspection artifacts

Previews are available in the Data Hub and the Hugging Face Space viewer. Preview repos provide raw files and README schema notes for manifest.json, point clouds, and RGB/label (stencil ID) pairs. These previews are for inspection; full sequences and commercial access are available on request via huggingface.co/sirlab-ai.

Validation protocol

Minimum checks:

Alignment: LiDAR projected onto RGB/labels (stencil IDs) is stable at edges.
Determinism: repeat runs → matching point counts and poses across frames.
Throughput: report simulated seconds per wall‑clock second.

Conclusion

Shader-based LiDAR is a GPU-resident reconstruction pipeline. It uses same-frame depth and label (stencil ID) images to produce real-time simulated point clouds aligned with RGB. Deterministic scan patterns and staged readback avoid CPU stalls, preserving throughput-stable headroom for closed-loop workloads and production dataset generation.

SiRLab integrates capture, reconstruction, and client export end-to-end to deliver simulation-grade LiDAR streams and datasets at scale. Commercial licensing and build partnerships are available for production deployments.

Ready for render-derived LiDAR in your stack?

Get aligned LiDAR, RGB, and labels without blowing your frame budget. We’ll show you the pipeline and ship a pilot sequence you can validate.

Request a demo Talk to an engineer

Back to docs