Skip to content

LMD0311/Awesome-World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 

Repository files navigation

Awesome World Models for Autonomous Driving Awesome

Collect some World Models (for Autonomous Driving) papers.

If you find some ignored papers, feel free to create pull requests, open issues, or email me / Qi Wang. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣

If you find this repository useful, please consider giving us a star 🌟.

Feel free to share this list with others! 🥳🥳🥳

Workshop & Challenge

Papers

World model original paper

  • Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]

Technical blog or video

  • Yann LeCun: A Path Towards Autonomous Machine Intelligence [paper] [Video]
  • CVPR'23 WAD Keynote - Ashok Elluswamy, Tesla [Video]
  • Wayve Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]

    World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.

Survey

  • A survey on multimodal large language models for autonomous driving. WACVW 2024 [Paper] [Code]
  • Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond. arXiv 2024.5 [Paper] [Code]
  • World Models for Autonomous Driving: An Initial Survey. 2024.3, arxiv [Paper]

2024

  • [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [GenAD] Generalized Predictive Model for Autonomous Driving. CVPR 2024 [Paper] [Data]
  • [Cam4DOCC] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. CVPR 2024 [Paper] [Code]
  • [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [DriveWorld] DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. CVPR 2024 [Paper]
  • [Panacea] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [MagicDrive] MagicDrive: Street View Generation with Diverse 3D Geometry Control. ICLR 2024 [Paper] [Code]
  • [Copilot4D] Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. ICLR 2024 [Paper]
  • [SafeDreamer] SafeDreamer: Safe Reinforcement Learning with World Models. ICLR 2024 [Paper] [Code]
  • [DriveSim] Probing Multimodal LLMs as World Models for Driving. arXiv 2024.5 [Paper] [Code]
  • [RoboDreamer] RoboDreamer: Learning Compositional World Models for Robot Imagination. arXiv 2024.4 [Paper] [Code]
  • [LidarDM] LidarDM: Generative LiDAR Simulation in a Generated World. arXiv 2024.4 [Paper] [Code]
  • [3D-VLA] 3D-VLA: A 3D Vision-Language-Action Generative World Model. arXiv 2024.3 [Paper]
  • [DriveDreamer-2] DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. arXiv 2024.3 [Paper] [Code]
  • [Think2Drive] Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. arXiv 2024.2 [Paper]

2023

  • [TrafficBots] TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023 [Paper] [Code]
  • [WoVoGen] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. arXiv 2023.12 [Paper] [Code]
  • [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. arXiv 2023.11 [Paper]
  • [OccWorld] OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. arXiv 2023.11 [Paper] [Code]
  • [MUVO] MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. arXiv 2023.11 [Paper]
  • [DrivingDiffusion] DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model. arXiv 2023.10 [Paper] [Code]
  • [GAIA-1] GAIA-1: A Generative World Model for Autonomous Driving. arXiv 2023.9 [Paper]
  • [ADriver-I] ADriver-I: A General World Model for Autonomous Driving. arXiv 2023.9 [Paper]
  • [DriveDreamer] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. arXiv 2023.9 [Paper] [Code]
  • [UniWorld] UniWorld: Autonomous Driving Pre-training via World Models. arXiv 2023.8 [Paper] [Code]

2022

  • [MILE] Model-Based Imitation Learning for Urban Driving. NeurIPS 2022 [Paper] [Code]
  • [Iso-Dream] Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. NeurIPS 2022 Spotlight [Paper] [Code]
  • [Symphony] Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. ICRA 2022 [Paper]
  • Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. IROS 2022 [Paper]
  • [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. NeurIPS 2022 workshop [Paper]

Other World Model Paper

2024

  • [3D-VLA] 3D-VLA: A 3D Vision-Language-Action Generative World Model. ICML 2024 [Paper] [Code]
  • [Genie] Genie: Generative Interactive Environments. DeepMind [Paper] [Blog]
  • [Sora] Video generation models as world simulators. OpenAI [Technical report]
  • [IWM] Learning and Leveraging World Models in Visual Representation Learning. Meta AI [Paper]
  • [V-JEPA] V-JEPA: Video Joint Embedding Predictive Architecture. Meta AI [Blog] [Paper] [Code]
  • [Newton] Newton™ – a first-of-its-kind foundation model for understanding the physical world. Archetype AI [Blog]
  • [MAMBA] MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning. ICLR 2024 [Paper] [Code]
  • [Compete and Compose] Compete and Compose: Learning Independent Mechanisms for Modular World Models. arXiv 2024.4 [Paper]
  • [MagicTime] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators. arXiv 2024.4 [Paper] [Code]
  • [Dreaming of Many Worlds] Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization. arXiv 2024.3 [Paper] [Code]
  • [ManiGaussian] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. arXiv 2024.3 [Paper] [Code]
  • [LWM] World Model on Million-Length Video And Language With RingAttention. arXiv 2024.2 [Paper] [Code]
  • Planning with an Ensemble of World Models. OpenReview [Paper]
  • [WorldDreamer] WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. arXiv 2024.1 [Paper] [Code]

2023

  • [IRIS] Transformers are Sample Efficient World Models. ICLR 2023 Oral [Paper] [Torch Code]
  • [STORM] STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning. NIPS 2023 [Paper] [Torch Code]
  • [TWM] Transformer-based World Models Are Happy with 100k Interactions. ICLR 2023 [Paper] [Torch Code]
  • [Dynalang] Learning to Model the World with Language. arXiv 2023.8 [Paper] [JAX Code]
  • [CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. arXiv 2023.5 [Paper]
  • [DreamerV3] Mastering Atari with Discrete World Models. arXiv 2023.1 [Paper] [JAX Code] [Torch Code]

2022

  • [DreamerPro] DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. ICML 2022 [Paper] [TF Code]
  • Deep Hierarchical Planning from Pixels. NIPS 2022 [Paper] [TF Code]

2021

2020

  • [DreamerV1] Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020 [Paper] [TF Code] [Torch Code]
  • [Plan2Explore] Planning to Explore via Self-Supervised World Models. ICML 2020 [Paper] [TF Code] [Torch Code]

2018

  • World Models. NIPS 2018 Oral [Paper]