Mapping the 3D World to an Image - 5 Minutes with Cyrill

Cyrill Stachniss

4 Mar 202105:33

Summary

TLDRThis video explains the process of mapping a point from the 3D world to a 2D camera image. Using the central projection model, the mapping is described through a projection matrix, considering different coordinate systems such as world, camera, image plane, and sensor. The video delves into the camera's location, rotation, intrinsic parameters, and lens distortion, all contributing to the transformation. The transformation, while not easily invertible, can be partially reversed to recover 3D coordinates by combining multiple camera images. The discussion emphasizes how this approach is used in computer vision and 3D point recovery.

Takeaways

😀 The script explains how a point in 3D space is mapped onto a 2D camera image using the central projection model.
😀 The central projection model assumes that all light rays pass through a single point, the projection center, in a pinhole camera system.
😀 The transformation from 3D coordinates to 2D pixel coordinates is expressed through the equation x = P * X, where P is the projection matrix, and X is the 3D point.
😀 The transformation involves four coordinate systems: world coordinates, camera coordinates, image plane coordinates, and sensor frame coordinates.
😀 The camera's position and orientation in 3D space are described using a location vector (x₀) and a rotation matrix.
😀 The 3D-to-2D mapping process involves a loss of one dimension since a 3D point is projected onto a 2D plane.
😀 Camera intrinsics, such as focal length and sensor size, are encoded in the calibration matrix and play a crucial role in the transformation.
😀 The transformation process is not easily reversible because the 3D point is mapped to a 2D coordinate, meaning there’s a loss of information.
😀 While the full inversion is not possible, a partial inversion is possible where the pixel corresponds to a direction in space along a straight line.
😀 By combining multiple camera views from different locations, 3D coordinates of points can be recovered by finding the intersection of two or more lines in space.

Q & A

What is the central projection model used to describe the camera?
-The central projection model, also known as the pinhole camera model, describes how 3D points in the world are mapped onto a 2D camera image. In this model, there is a single point called the projection center, and all rays of light pass through this point to form the image.
What is the equation used to map a 3D point to a 2D pixel location?
-The equation for the transformation is x = pX, where x is the pixel coordinate, p is the projection matrix, and X is the 3D point in the world coordinate system.
What are the four coordinate systems involved in the mapping process?
-The four coordinate systems involved are: the world coordinate system, the camera coordinate system, the image plane coordinate system, and the sensor frame (which describes the pixel location).
What does the projection matrix p consist of?
-The projection matrix p consists of a 3x4 matrix that includes both extrinsic parameters (rotation matrix and the camera location) and intrinsic parameters (such as the calibration matrix describing the camera's internal properties).
How is the camera location described in the world coordinate system?
-The camera location in the world coordinate system is described by the position of the projection center, denoted as x0, which is given by its x, y, and z coordinates. Additionally, the rotation matrix is used to specify the orientation of the camera in the world.
What are the camera's intrinsic parameters?
-The intrinsic parameters of the camera, typically encoded in a calibration matrix, describe internal properties such as the distance from the projection center to the image plane and the placement of the camera's sensor chip on the image plane.
What does the transformation from the world to the pixel coordinate achieve?
-The transformation from the world coordinate system to the pixel coordinate system results in the mapping of a 3D world point to a 2D pixel location. However, this transformation cannot be easily inverted because it loses one dimension during the projection.
Why is it impossible to fully invert the transformation from 3D to 2D?
-The transformation cannot be fully inverted because mapping a 3D point to a 2D plane results in a loss of one dimension. Multiple 3D points can map to the same 2D pixel coordinate, meaning we cannot recover the exact 3D location of a point, only the direction in which it lies.
How can the 3D coordinates of a point be recovered?
-The 3D coordinates of a point can be partially recovered by using multiple camera images taken from different locations. By intersecting the straight lines in space (each corresponding to a 2D pixel location), the point where the lines intersect gives the location of the 3D point.
What does the term 'degrees of freedom' refer to in the context of camera transformation?
-The term 'degrees of freedom' refers to the number of independent parameters required to describe the full transformation from the world coordinate system to the pixel coordinate system. It typically involves 11 parameters: 6 for the camera's extrinsics (location and rotation) and 5 or 4 for the intrinsics (camera internal properties), along with additional non-linear parameters for lens distortion.