How Zoox Uses Computer Vision To Advance Its Self-Driving Technology

Zoox
24 Nov 202006:22

Summary

TLDRIn this video, Sarah, the Senior Director of Perception at Zoox, explains how the company's computer vision system enables their autonomous vehicles to navigate complex urban environments like San Francisco and Las Vegas. The system uses real-time camera feeds, along with lidar and radar, to detect and classify objects such as pedestrians, vehicles, and their attributes. Sarah covers key features such as pedestrian skeleton detection, intent prediction, and vehicle signals, all of which help the autonomous system drive safely by anticipating human actions and adapting to dynamic scenarios.

Takeaways

  • 🚗 Zoox uses a sophisticated computer vision system as part of its autonomous driving software, enabling vehicles to navigate complex urban environments.
  • 👀 The perception stack, which includes camera, lidar, and radar data, acts as the 'eyes' of the vehicle, providing detailed information about the surrounding environment.
  • 📦 Neural networks process camera feeds to detect objects, create 2D bounding boxes, and perform instance segmentation, identifying which pixels belong to each object.
  • 🌍 The system classifies each pixel semantically, distinguishing between pedestrians, cars, sidewalks, roads, and vegetation, and predicts depth even though cameras can't directly measure distance.
  • 🚶 Zoox’s software can identify pedestrian attributes such as standing or walking, which helps predict their intentions and plan vehicle movements accordingly.
  • 🧍‍♀️ Key points on pedestrians' skeletons, like hands, elbows, and knees, are detected and used for advanced tasks such as tracking, gesture recognition, and predicting movement intent.
  • 🚦 The system recognizes vehicle attributes like indicator lights, brake lights, and hazard lights, improving prediction of other drivers' actions.
  • 🚨 Emergency vehicles are detected and classified, including when their lights are on, ensuring the vehicle adapts its behavior, such as pulling over or stopping.
  • 🚪 Open car doors are detected, alerting the vehicle to potential hazards and enabling it to react appropriately to avoid accidents.
  • 🖐️ Gesture detection is used to understand pedestrians' intentions, such as signaling for the vehicle to stop or proceed, which aids in safer navigation and decision-making.

Q & A

  • What is the role of the perception stack in Zoox's autonomous driving system?

    -The perception stack serves as the 'eyes' of the driving system, processing real-time inputs from cameras, LiDAR, and radar sensors to understand the environment around the vehicle. This information helps the vehicle plan and drive safely.

  • How does the computer vision system handle object detection?

    -The computer vision system uses neural networks to detect objects by computing a 2D bounding box around each object, performing instance segmentation (identifying which pixels belong to the object), and determining the object's semantic class (e.g., pedestrian, car, road).

  • How does the system estimate depth if cameras alone can't measure distances?

    -The system predicts the depth of each pixel in the scene by using advanced algorithms and neural networks, even though cameras don't directly measure distances.

  • What is skeleton detection and how is it used in the system?

    -Skeleton detection identifies the positions of key points on a pedestrian's body (e.g., hands, elbows, knees). This data is used for higher-level tasks such as tracking, gesture detection, and predicting pedestrian intent.

  • What kind of higher-level signals does the system detect from pedestrians?

    -The system detects signals like whether pedestrians are standing or walking, their gestures (e.g., waving to signal the vehicle to stop or go), and distractions such as whether they are looking at their phones. These signals help predict pedestrian behavior.

  • How does the system recognize different types of pedestrians and their attributes?

    -The system classifies pedestrians based on their appearance, pose, and actions, such as whether they are construction workers, riding scooters, pushing strollers, or looking at phones. More than 30 different attributes are tracked to enhance behavior prediction.

  • What vehicle attributes are important for the perception system to classify?

    -The system classifies vehicle attributes like brake lights, indicator lights, reverse lights, hazard lights, and emergency lights. These signals help predict the intentions of other drivers on the road.

  • How does the system respond to emergency vehicles?

    -The system detects emergency vehicles such as ambulances and police cars and identifies whether their emergency lights are on. This allows the vehicle to adjust its behavior, such as pulling over or stopping.

  • What challenge do open doors on parked vehicles present to the system, and how is it handled?

    -Open doors on parked vehicles can pose a hazard, as they may indicate a person is about to step into the road. The system detects open doors and adjusts the vehicle's driving path to avoid potential collisions.

  • What kind of gestures does the system detect, and how do they influence driving behavior?

    -The system detects gestures such as pedestrians holding up their hand to indicate they want the vehicle to stop or waving to signal the vehicle to continue. These gestures are strongly correlated with pedestrian behavior and help the vehicle make decisions.

Outlines

plate

このセクションは有料ユーザー限定です。 アクセスするには、アップグレードをお願いします。

今すぐアップグレード

Mindmap

plate

このセクションは有料ユーザー限定です。 アクセスするには、アップグレードをお願いします。

今すぐアップグレード

Keywords

plate

このセクションは有料ユーザー限定です。 アクセスするには、アップグレードをお願いします。

今すぐアップグレード

Highlights

plate

このセクションは有料ユーザー限定です。 アクセスするには、アップグレードをお願いします。

今すぐアップグレード

Transcripts

plate

このセクションは有料ユーザー限定です。 アクセスするには、アップグレードをお願いします。

今すぐアップグレード
Rate This

5.0 / 5 (0 votes)

関連タグ
Autonomous DrivingComputer VisionAI TechnologyPedestrian DetectionVehicle AttributesMachine LearningUrban EnvironmentsSafety SystemsPerception StackReal-time Analytics
英語で要約が必要ですか?