[0:03]End-to-end autonomous driving has become a promising direction for self-driving cars. Today on Drive Labs, together we'll explore an end-to-end autonomous driving solution from Nvidia that learns a neural network planner on top of bird's eye view or BEV features. End-to-end autonomous driving refers to a holistic approach, where a system takes in raw sensor data from cameras, radar, and lidar, and directly outputs vehicle controls. Unlike traditional systems that rely on modular designs with separate components, such as detection, tracking, prediction, planning, and control. End-to-end driving aims to streamline this process to avoid a deep and cascade path from perception to planning. Nvidia's end-to-end driving model combines detection, tracking, prediction, and planning into a single network with a minimalistic design. The planning inputs comes directly from the BEV feature map, generated from sensor data like cameras and lidar. This streamlined approach reflects the disruptions seen with data driven approaches in intelligent voice systems where tasks like speech recognition and response generation are integrated into a cohesive end-to-end learning process. In particular, our end-to-end driving model uses Eagle queries, which are learnable embeddings to cross-attend the BEV features. The resulting refined Eagle query then outputs the planned trajectory through a multi-layer perception or NLP layer. This simple design challenges the conventional assumption that a complex and cascade system is required for effective autonomous driving planning. The simplicity and efficiency of the proposed design offer both the flexibility to deploy across platforms and the scalability to take on larger data sets. It achieves outstanding results by directly tapping into BEV features for planning. The solution also provides a universal framework to enhance machine learning-based planning with rule-based planners. Using multi-target Hydra distillation as a pivotal strategy, the method employs multiple specialized teachers to learn trajectories that align with various simulation-based metrics. This integration ensures the model not only mimics human driving behaviors, but also adheres to traffic rules and safety standards, addressing traditional imitation learning limitations. Nvidia's end-to-end autonomous driving solutions deliver new capabilities and higher level of performance. This enabled our team to recently win the CVPR's 2024 end-to-end driving at scale challenge and receive the Innovation Award. Advanced AI for autonomous vehicles are staggering, both in terms of AI expertise and the infrastructure needed to support the latest generated AI and end-to-end models. We are proud of Nvidia Research team's ability to develop end-to-end autonomous driving technologies that will enable safer and more human-like driving experiences in urban scenarios. Having an advanced end-to-end stack requires two key elements. First, extending driving data across diverse scenarios. At Nvidia, we emphasize the need to collect the right data, finding the corner cases that will refine existing benchmarks. Omniverse Cloud APIs for AV simulation offer a sophisticated environment that allows researchers and developers to conveniently generate realistic scenarios through physics-based simulations. This will ensure that autonomous vehicles can navigate real-world complexities safely and efficiently. Second, you need powerful computing hardware to handle the immense computational demands of the transformer-based EV models. Nvidia built Drive AGX door using Blackwell GPU architecture to run advanced algorithms like the end-to-end driving model we discussed today. To learn more, please visit our GitHub page and check out Nvidia's work at CVPR 2024.

End-to-End Autonomous Driving: A Bird’s-Eye View - DRIVE Labs Ep. 35
NVIDIA
4m 13s536 words~3 min read
Auto-Generated
Watch on YouTube
Share
MORE TRANSCRIPTS


