Segmentation refers to the process of extracting the desired object (or objects) of interest from the background in an image or data volume. There are a variety of techniques that are used to do this, ranging from the simple (such as thresholding and masking) to the complex (such as edge/boundary detection, region growing and clustering algorithms.) Segmentation can be aided through manual intervention or handled automatically through software algorithms. It can be performed before building the 3-D reconstruction by processing of images in the image stack, or after the 3-D model has been formed.
Examples of simple forms of segmentation that can be used with confocal data include thresholding and masking.
Thresholding involves limiting the intensity values within an individual image or the entire image stack to a certain bounded range (or ranges). For example, since each pixel in an 8-bit greyscale confocal image (with values 0 [black] to 255 [white]) corresponds to fluorescence intensity at a point within the specimen, the pixels with lower values represent areas with lower fluorescence while the pixels with higher values represent brighter regions. It may be decided that all pixels below a certain value do not contribute significantly to the object(s) of interest and hence can be eliminated. This can be done by scanning the image(s) one pixel at a time, and keeping that pixel if it is above the selected intensity value, or setting it to 0 (black) if it is below that value. In a similar manner, thresholding can also be used to eliminate non-consecutive ranges of intensities while preserving the regions containing the intensities of interest.
Masking is a procedure whereby an enclosed region(s) of an image (or of the image stack) are defined for processing. This can be done either by manually tracing around the regions of interest (e.g. with a mouse in a graphics application) or by an automated routine. An easy (and useful) application of this is to use a 2-D stacked projection of an image to define the image mask. The stacked projection of the image stack is a single image that represents the sum of all of the images in the image stack (these images can usually be provided automatically from software supplied with the LSCM.) If the object of interest has a closed, continuous surface (such as that of a neuron) the stacked projection defines the absolute boundaries of the object in 2-D. A mask can be formed by either manually tracing around the boundaries of the object(s) of interest in the stacked projection or by absolute thresholding (making all intensities above a certain value white and all below this value black.) The mask can now be applied to the entire image stack, such that regions falling within the mask selection area are preserved, whereas areas outside this region are eliminated (e.g. set to 0 [black].) After the mask has been applied, thresholding and image filtering methods can be used to aid in removing the remaining undesired regions.

Introduction to Image Segmentation

What is Image Segmentation?

Image segmentation is a partitioning of an image into related sections or regions. These regions may be later associated with informational labels, but the segmentation process simply gives each region a generic label (region 1, region 2, etc.). In the context of Earth remote sensing, informational labels would generally be a ground cover type or land use category. The regions consist of groupings of multispectral or hyperspectral image pixels that have similar data feature values. These data feature values may be the multispectral or hyperspectral data values themselves and/or they may be derived features such as band ratios or textural features.
Figure 1 shows an RGB representation of a Landsat Thematic Mapper (TM) scene from over St. Charles, Maryland. This scene was an early evaluation scene taken within a couple weeks of the launch of Landsat-4 (launched July 16, 1982). Figure 2 shows a Java Script animation of a five level hierarchical segmentation of the Landsat TM displayed in Figure 1. The finest level of detail has 126 regions and the ensuing coarser segmentation have 70 regions, 42 regions, 12 regions and 4 regions, respectively. (You can't necessarily discern all of the regions in the finer segmentations, as some of the regions are only a few pixels in size.) The regions colored with shades of green roughly correspond to wooded areas. The regions colored with shades of turquoise roughly correspond to grassy areas (mixed with residential at the coarser levels). The regions colored with shades of yellow correspond to roads, residential areas, shopping centers, etc. The regions colored with shades of blue correspond to water, and the red and pink areas correspond roughly to agricultural fields (with some mixing with grassy areas). Finally, the white to gray areas correspond to bare soil (gravel pits, land fill, plowed fields, construction areas, etc.). An approach to obtain a better labeling of the various regions is discussed below in the section on the "Region Labeling Tool".

Figure 1. RGB Representation of a Landsat TM scene.

Simple Classifiers

There are at least two ways to approach the design of a classifier:
  1. Hypothesize a plausible solution and adjust it to fit the problem

  2. Create a mathematical model of the problem and derive an optimal classifier

The first method is more intuitive, is frequently used in practice, and is the approach that we shall take. We start with a very simple solution, analyze its characteristics, identify its weaknesses, and complicate it only as necessary.


It frequently happens that the a given class is not homogeneous, but is composed of a number of distinct subclasses. In the example shown above, there are obviously three different kinds of letters in the "A" class, and the average or mean feature vector may not represent any one subclass, let alone all of them. In designing the classifier, it would make sense to have three categories A1, A2 and A3, and say that the input is an "A" if it matches either A1 or A2 or A3. In general, if we know that a class contains k subclasses, we could design a two-stage classifier, in which we first assign a feature vector x to a subclass, and then OR the results to identify the class.

The problem of finding subclasses in a set of examples from a given class is called unsupervised learning. The problem is easiest when the feature vectors for examples in a subclass are close together and form a cluster . We will consider four popular methods for finding clusters:

Feature Extraction

A Pattern Recognition system is composed of
  1. Pre-processing
  2. Feature Extraction
  3. Classification

Feature Extraction is a crucial step in Pattern Recognition. It is responsible for measuring features of objects in an image.
In this experiment we have a binary image with different objects. The feature used in this illustrative example is the first invariant moment. It measures the spread of pixels from the centroid of the object.
Original binary image

Labeling is an intermediate step in feature extraction. It allows individual measurements of the objects. The maximum pixel value of the labeled image shown below gives us the number of objects, 28 objects. Note that there are three very small objects that cannot be seen at first sight.
Labeled image

Based on the first invariant moment attribute of each object, it is possible to plot and visualize the graph below. We can count 13 objects with small values, which correspond to the ring screws. There are 9 large values corresponding to the nails and tee-pins. There are also 3 objects with measurements with this attribute closed to zero which correspond to the three small noise dots in the image.
Region number by first invariant moment

Feature Vectors

It frequently happens that we can measure a fixed set of d features for any object or event that we want to classify. For example, we might always be able to measure
x1 = area
x2 = perimeter
xd = arc_length / straight_line_distance
In this case, we can think of our feature set as a feature vector x, where x is the d-dimensional column vector

Equivalently, we can think of x as being a point in a d-dimensional feature space. By this process of feature measurement, we can represent an object or event abstractly as a point in feature space.

Robust Analysis of Feature Spaces: Color Image Segmentation by Dorin Comaniciu and Peter Meer

Department of Electrical and Computer Engineering
Rutgers University, Piscataway, NJ 08855, USA

A general technique for the recovery of significant image features is presented. The technique is based on the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Feature space of any nature can be processed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for content-based query systems. A 512x512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.

Position Estimation of Micro-Rovers using a Spherical Coordinate Transform Color Segmenter

This work addresses position estimation of a micro-rover mobile robot as a larger robot tracks it through large spaces with unstructured lighting. We use the Spherical Coordinate Transform color segmenter commonly used in medical applications. Data was collected from 50 images taken in five types of lighting: fluorescent, tungsten, daylight lamp, natural daylight indoors and outdoors. The results show that average pixel error was 1.5, with an average error in distance estimation of 6.3 cm. The size of the error did not vary greatly with the type of lighting. In addition to giving segmentation results comparable to stereo triangulation, our approach has other advantages including low computational complexity O(n^2) and lightweight, inexpensive hardware.

Examples of color segmentation in different lighting conditions

Fluorescent lighting

Original SCT Segmentation HSI Segmentation RGB Threshold Segmentation

Tungsten Lighting

Original SCT Segmentation HSI Segmentation RGB Threshold Segmentation

Daylight lamp (halogen with blue filter)

Original SCT Segmentation HSI Segmentation RGB Threshold Segmentation

Indoor sunlight

Original SCT Segmentation HSI Segmentation RGB Threshold Segmentation

Outdoor sunlight

Original SCT Segmentation HSI Segmentation RGB Threshold Segmentation