Introduction to Stereo Imaging -- Theory

Let us consider a simplified approach to the mathematics of the problem in order to aid understanding of the tasks involved.
We will consider a set up using two cameras in stereo. -- other methods that involve stereo are similar.
Let's consider a simplified optical set up:

Fig. 5 A simplified stereo imaging system
Fig. 5 shows:

2 cameras with their optical axes parallel and separated by a distance d.
The line connecting the camera lens centres is called the baseline.
Let baseline be perpendicular to the line of sight of the cameras.
Let the x axis of the three-dimensional world coordinate system be parallel to the baseline
let the origin O of this system be mid-way between the lens centres.

Consider a point (x,y,z), in three-dimensional world coordinates, on an object.
Let this point have image coordinates

and

in the left and right image planes of the respective cameras.
Let f be the focal length of both cameras, the perpendicular distance between the lens centre and the image plane. Then by similar triangles:
eqnarray84

Solving for (x,y,z) gives:
eqnarray100

The quantity

which appears in each of the above equations is called the disparity.
There are several practical problems with this set up:

Near objects accurately acurately but impossible for far away objects. Normally, d and f are fixed. However, distance is inversely proportional to disparity. Disparity can only be measured in pixel differences.
Disparity is proportional to the camera separation d. This implies that if we have a fixed error in determining the disparity then the accuracy of depth determination will increase with d.

However as the camera separation becomes large difficulties arise in correlating the two camera images.
In order to measure the depth of a point it must be visible to both cameras and we must also be able to identify this point in both images.
As the camera separation increases so do the differences in the scene as recorded by each camera.
Thus it becomes increasingly difficult to match corresponding points in the images.
This problem is known as the stereo correspondence problem.

Methods of Acquisition

Laser Ranging Systems
Laser ranging works on the principle that the surface of the object reflects laser light back towards a receiver which then measures the time (or phase difference) between transmission and reception in order to calculate the depth.
Most laser rangefinders:

Work at long distances (greater than )
Consequently their depth resolution is inadequate for detailed vision tasks.
Shorter range systems exist but still have an inadequate depth resolution (1cm at best) for most practical industrial vision purposes.

Structured Light Methods
Basic idea:

Project patterns of light (grids, stripes, elliptical patterns etc.) onto an object.
Surface shapes are then deduced from the distortions of the patterns that are produced on Object's Surface.
Knowing relevant camera and projector geometry, depth can be inferred by triangulation.

Many methods have been developed using this approach.
Major advantage -- simple to use.
Low spatial resolution -- patterns become sparser with distance.
Some close range (4cm) sensors exist with good depth resolution (around 0.05mm) but have very narrow field of view and close range of operation.

Moire Fringe Methods
The essence of the method is that a grating is projected onto an object and an image is formed in the plane of some reference grating as shown in Fig. 6.
The image then interferes with the reference grating to form Moire fringe contour patterns which appear as dark and light stripes, as demonstrated by Fig. 7. Analysis of the patterns then gives accurate descriptions of changes in depth and hence shape.

Fig. 6 A moire projection system

Fig. 7 Moire fringe patterns
NOTE: Ambiguities arise in interrogating the fringe patterns.

It is not possible to determine whether adjacent contours are higher or lower in depth.
Resolve by moving one of the gratings and taking multiple Moire images.
Reference grating can also be omitted and its effect can be simulated in software.

Moire fringe methods are capable of producing very accurate depth data (resolution to within about 10 microns) but the methods have certain drawbacks.

Methods are relatively computationally expensive.
Surfaces at a large angle are sometimes unmeasurable -- fringe density becomes too dense.

Shape from Shading Methods
Methods based on shape from shading employ photometric stereo techniques to produce depth measurements.
Using a single camera, two or more images are taken of an object in a fixed position but under different lighting conditions.
By studying the changes in brightness over a surface and employing constraints in the orientation of surfaces, certain depth information may be calculated.
Methods based on these techniques are not suited for general three-dimensional depth data acquisition:

Methods are sensitively dependent on the illumination and surface reflectance properties of objects present in the scene.
Methods only work well on objects with uniform surface texture.
It is difficult to infer absolute depth, and only surface orientation is easily inferred.
Methods are mostly used when it is desired to extract surface shape information.

Passive Stereoscopic Methods
Stereoscopy as a technique for measuring range by triangulation to selected locations in a scene imaged by two cameras already -- further details on general stereo configurations in Books.
The primary computational problem of stereoscopy is to find the correspondence of various points in the two images.
This requires:

Reliable extraction of certain features (such as edges or points) from both images
Matching of corresponding features between images.
Both of these tasks are non-trivial and computationally complex.
Passive stereo may not produce depth maps within a reasonable time.
the depth data produced is typically sparse since high level features, such as edges, are used rather than points.

NOTE:

Problems in finding and accurately locating features in each image can be hard.
Care needed not to introduce errors.
Depth measurements accurate to a few millimetres.
One such passive stereo vision system is TINA developed at Sheffield University.

Active Stereoscopic Methods
The problems of passive stereoscopic techniques may be overcome by

Illuminating the scene with a strong source of light (in the form of a point or line of light) which can be observed by both cameras.
Known corresponding points provided in each image.
Depth maps can then be produced by sweeping the light source across the whole scene.
Laser light source typically employed.
Active stereo can only be applied in controlled environments -- industrial applications.

Our Active Stereo Vision System

This Section describes the active stereoscopic subsystem which provides the three-dimensional data to our system for automatically inspecting mechanical parts.
NOTE: Whilst this Section considers some specific active stereo problems, many of the other issues discussed are not specific to any particular three-dimensional data acquisition technique, and will be of general interest.
The main components of the Vision System are illustrated by the schematic diagram in Fig. 8.

Fig. 8 Schematic diagram of vision system
The vision system consists of:

a matched pair of high sensitivity CCD cameras,
a laser scanner all mounted on an optical bench to reduce vibration.

Initially the cameras of the system must be calibrated in order to

determing the 3D position of them relative to some world coordinates
focal length and lens distortion of the camera (+ lens etc.).
Camera Calibration is described in my book.

Depth maps extracted from the scene by :

Moving the laser stripe across the scene to obtain a series of vertical columns of pixels
Triangulate Pixels to give the required dense depth map. The depth of a point is measured as the distance from one of the cameras, chosen as the master camera.
Knowing the relevant geometry and optical properties of the cameras the depth map is constructed using the following method:

Fig. 9 Measuring a depth value

For each vertical stripe of laser light form an image of the stripe in the pair of frames from each camera.
For each row in the master camera image, search until the stripe is found at point P(i,j), say.
Form a three-dimensional line l passing through the centre of the master camera and P(i,j).
Construct the epipolar line which is the projection of the line l into the image formed by the other camera. Do this by projecting two arbitrary points and into the image and constructing a line between the two projected points.
Search along the epipolar line for the laser stripe. If it is found at , proceed to Step 6.
Find the point on line l which corresponds to . Calculate the (x,y,z) coordinates of , and store the z value at position (i,j) corresponding to x and y in the depth map.

The position of the point

is easily found by projecting a line

from the centre

of the secondary camera passing through Q. The intersection of the lines l and

gives the coordinates of

.
The depth map is formed by using a world coordinate system fixed on the master camera with its origin at

Fig. 10 Depth Map/Image Overlay

The 3D Image - Depth Maps

The simplest and most convenient way of representing and storing the depth measurements taken from a scene is a depth map.
A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array's elements (pixels).
Depth map is like a grey scale image except the z information (float - 32 bytes) replaces the intensity information.

Fig. 3 Artificial depth maps

Fig. 4 Real depth maps

Why use 3D data?

An 3D image containing has many advantages over its 2D counterpart:

Explicit Geometry

2D images give only limited information the physical shape and size of an object in a scene.
3d images express the geometry in terms of three-dimensional coordinates.

e.g Size (and shape) of an object in a scene can be straightforwardly computed from its three-dimensional coordinates.

Recent technological advances ( e.g. in camera optics, CCD cameras and laser rangefinders) have made the production of reliable and accurate three-dimensional depth data possible.
Consequently many three-dimensional data acquisition systems have been developed.