DriveQA: Passing the Driving Knowledge Test

Abstract

If a Large Language Model (LLM) were to take a driving knowledge test today, would it pass? Beyond standard spatial and visual question-answering (QA) tasks on current autonomous driving benchmarks, driving knowledge tests require a complete understanding of all traffic rules, signage, and right-of-way principles. To pass this test, human drivers must discern various edge cases that rarely appear in real-world datasets. In this work, we present DriveQA, an extensive open-source text and vision-based benchmark that exhaustively covers traffic regulations and scenarios. Through our experiments using DriveQA, we show that (1) state-of-the-art LLMs and Multimodal LLMs (MLLMs) perform well on basic traffic rules but exhibit significant weaknesses in numerical reasoning and complex right-of-way scenarios, traffic sign variations, and spatial layouts, (2) fine-tuning on DriveQA improves accuracy across multiple categories, particularly in regulatory sign recognition and intersection decision-making, (3) controlled variations in DriveQA-V provide insights into model sensitivity to environmental factors such as lighting, perspective, distance, and weather conditions, and (4) pretraining on DriveQA enhances downstream driving task performance, leading to improved results on real-world datasets such as nuScenes and BDD, while also demonstrating that models can internalize text and synthetic traffic knowledge to generalize effectively across downstream QA tasks.

Dataset Statistics

**Distribution of Question Type in DriveQA-T.** The benchmark covers five key domains and 19 sub-class types.

DriveQA-WordCloud — **Word Cloud of Questions in DriveQA.** The figure statistically summarizes the language terms in the introduced DriveQA benchmark.

Main Results

Text QA Results — **Challenging Categories on DriveQA-T.** We show the results of most difficult 10 types: Limits: Speed and Distance Limits, Alcohol: Blood Alcohol Limits and DUI Laws, Passing: Passing Rules and Lane Usage in Restricted Situations, Penalties: Driver's License Penalties, Parking: Parking and Wheel Positioning, Highway: Passing Rules and Lane Usage in Highway, Turning: Turning Rules, Signs: Traffic Signs and Signals, Headlight: Headlight Usage, Intersection: Right-of-Way and Lane Selection. The Average is the summary based on all 19 types of questions. We denote with green the top method, and light green second best.

Image QA Results — **Summarized Results on DriveQA-V.** We show model performance (accuracy %) for VQA. The dataset is divided into two main categories: intersections and signs (categorized into camera perspective and type).

Sim-to-Real Transferability and Downstream Task Performance

**Sim-to-Real Generalization.** We pre-train on synthetic DriveQA (DQA) and evaluate on real-world Mapillary images. The Mapillary dataset comprises challenging scenarios with various traffic sign placements, occlusion, and illumination.

**End-to-End Trajectory Planning Results on nuScenes.** We compute the L2 error at different prediction horizons (1s, 2s, and 3s). Lower L2 error shows our DriveQA (DQA) dataset can transfer from simulation to real-world driving tasks.

**Evaluation on BDD-OIA Dataset.** We report mean F1 score (mF1) and overall F1 score (F1 all) for both action and explanation tasks. The results show that fine-tuning on DriveQA improves performance on both tasks.

BibTeX

@inproceedings{wei2025driveqa,
        title={Passing the Driving Knowledge Test},
        author={Wei, Maolin and Liu, Wanzhou and Ohn-Bar, Eshed},
        booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
        year={2025}
      }

Please cite DriveQA if you find it helpful!