# Segmentation Glossary

## Classification

In this experiment we use the same images used in the Feature Extraction lesson.
Original binary image

Labeled image

The first invariant moment is extracted and plotted below. The background was made object 0 with feature value set to -1.
Region number by first invariant moment

Picking one sample of each object and assigning a class to it:
```
class   object           object number    feature value
---------------------------------------------------------
0     background           0              -1.00
1     nail                 1               0.93
2     ring screw           2               0.30
3     tee-pin              5               1.92
4     small dot noise     18               0.00
```

The Minimum Distance Classify operator assigns to each object the closest class according to the feature distance. The distance metric commonly used is Euclidean distance.
The result of the classify is a table with the relationship of objects and its class. Using the same technique as in the Area Measurement and Display lesson, we can assign a class value to the pixels of each object. This enables to visualize the result of the classification.
Classified image

It can be noted that there were 2 misclassifications corresponding to two tee-pin objects
```
object feature value
--------------------
3       1.01
12       1.41
```

This is consistent with the nearest distance classify method because these feature values are closer to the attribute of the nail (0.93) than of the tee-pin (1.92). There are several ways to solve this: by choosing a better sample value, or by choosing an additional feature value, etc.

# Area Distribution

The determination of the area distribution of cells in an image is a common operation in Image Analysis. Image Analysis applies to operations performed on images (2D data) to make quantitative measurements in order to describe an image. That is, we want to extract certain features of an image. A feature is an attribute (primitive) that is used to make decisions about objects in an image. Some primitives are natural and are defined by the visual appearance of the image. There are other features called artificial features and are a result of operations performed on an image. The procedures applied for image analysis are application-oriented. That is, what it is good for one application may not be suitable for another. An important concept not to forget is that procedures on the data will not increase the information content of the original data.
A procedure to calculate the area distribution of cells is presented here. The binary image for our experiment is shown below.
Binary image of cells

The first step is to label the image, i.e., a process where each connected region is assigned a unique value. The labeling algorithm runs in a raster scan order so that the regions are sequentially numbered from top-left to bottom-right of the image. The maximum value in the labeled image gives the total number of connected regions in the image.
Labeled image

The histogram of the labeled image gives the area corresponding to each connected region. From the histogram we observe that the region with the largest area corresponds to 1359 pixels. NOTE: the background is also a connected region but does not carry valuable information for this experiment. Make sure your histogram skips the background or region 0. Also, by obtaining statistics of the labeled image we observe that there are 105 distinct regions.

We can again take the histogram of the histogram. This allows us to group cells into different classes. By modifying the bin size parameter in the calculation of a histogram we can determine the number of regions that fall within different groups. For instance if we set the bin size parameter to 400, the new histogram provides the number of regions that fall in the following area ranges or groups:

```    class  area        n. of regions
------------------------------------
0      0- 399          88
1    400- 799          15
2    800-1199          1
3   1200-1599          1
------------------------------------
```

# Decision Boundaries

In general, a pattern classifier carves up (or tesselates or partitions) the feature space into volumes called decision regions. All feature vectors in a decision region are assigned to the same category. The decision regions are often simply connected, but they can be multiply connected as well, consisting of two or more non-touching regions.

The decision regions are separated by surfaces called the decision boundaries. These separating surfaces represent points where there are ties between two or more categories.

For a minimum-distance classifier, the decision boundaries are the points that are equally distant from two or more of the templates. With a Euclidean metric, the decision boundary between Region i and Region j is on the line or plane that is the perpendicular bisector of the line from mi to mj. Analytically, these linear boundaries are a consequence of the fact that the discriminant functions are linear. (With the Mahalanobis metric, the decision boundaries are quadratic surfaces, such as ellipsoids, paraboloids or hyperboloids.)

Nearest-template decision boundaries

How well the classifier works depends upon how closely the input patterns to be classified resemble the templates. In the example sketched below, the correspondence is very close, and one can anticipate excellent performance. However, things are not always this good in practice, and one should understand the limitations of simple classifiers.