Adaptive Thresholding

Brief Description

Thresholding is used to segment an image by setting all pixels whose intensity values are above a threshold to a foreground value and all the remaining pixels to a background value.
Whereas the conventional thresholding operator uses a global threshold for all pixels, adaptive thresholding changes the threshold dynamically over the image. This more sophisticated version of thresholding can accommodate changing lighting conditions in the image, e.g. those occurring as a result of a strong illumination gradient or shadows.

How It Works

Adaptive thresholding typically takes a grayscale or color image as input and, in the simplest implementation, outputs a binary image representing the segmentation. For each pixel in the image, a threshold has to be calculated. If the pixel value is below the threshold it is set to the background value, otherwise it assumes the foreground value.
There are two main approaches to finding the threshold: (i) the

Chow and Kanenko approach and (ii)

local thresholding. The assumption behind both methods is that smaller image regions are more likely to have approximately uniform illumination, thus being more suitable for thresholding. Chow and Kanenko divide an image into an array of overlapping subimages and then find the optimum threshold for each subimage by investigating its histogram. The threshold for each single pixel is found by interpolating the results of the subimages. The drawback of this method is that it is computational expensive and, therefore, is not appropriate for real-time applications.
An alternative approach to finding the local threshold is to statistically examine the intensity values of the local neighborhood of each pixel. The statistic which is most appropriate depends largely on the input image. Simple and fast functions include the mean of the local intensity distribution,

the median value,

or the mean of the minimum and maximum values,

The size of the neighborhood has to be large enough to cover sufficient foreground and background pixels, otherwise a poor threshold is chosen. On the other hand, choosing regions which are too large can violate the assumption of approximately uniform illumination. This method is less computationally intensive than the Chow and Kanenko approach and produces good results for some applications.

Guidelines for Use

Like global thresholding, adaptive thresholding is used to separate desirable foreground image objects from the background based on the difference in pixel intensities of each region. Global thresholding uses a fixed threshold for all pixels in the image and therefore works only if the intensity histogram of the input image contains neatly separated peaks corresponding to the desired subject(s) and background(s). Hence, it cannot deal with images containing, for example, a strong illumination gradient.
Local adaptive thresholding, on the other hand, selects an individual threshold for each pixel based on the range of intensity values in its local neighborhood. This allows for thresholding of an image whose global intensity histogram doesn't contain distinctive peaks.
A task well suited to local adaptive thresholding is in segmenting text from the image

Because this image contains a strong illumination gradient, global thresholding produces a very poor result, as can be seen in

Using the mean of a 7×7 neighborhood, adaptive thresholding yields

The method succeeds in the area surrounding the text because there are enough foreground and background pixels in the local neighborhood of each pixel; i.e. the mean value lies between the intensity values of foreground and background and, therefore, separates easily. On the margin, however, the mean of the local area is not suitable as a threshold, because the range of intensity values within a local neighborhood is very small and their mean is close to the value of the center pixel.
The situation can be improved if the threshold employed is not the mean, but (mean-C), where C is a constant. Using this statistic, all pixels which exist in a uniform neighborhood (e.g. along the margins) are set to background. The result for a 7×7 neighborhood and C=7 is shown in

and for a 75×75 neighborhood and C=10 in

The larger window yields the poorer result, because it is more adversely affected by the illumination gradient. Also note that the latter is more computationally intensive than thresholding using the smaller window.
The result of using the median instead of the mean can be seen in

(The neighborhood size for this example is 7×7 and C = 4). The result shows that, in this application, the median is a less suitable statistic than the mean.
Consider another example image containing a strong illumination gradient

This image can not be segmented with a global threshold, as shown in

where a threshold of 80 was used. However, since the image contains a large object, it is hard to apply adaptive thresholding, as well. Using the (mean - C) as a local threshold, we obtain

with a 7×7 window and C = 4, and

with a 140×140 window and C = 8. All pixels which belong to the object but do not have any background pixels in their neighborhood are set to background. The latter image shows a much better result than that achieved with a global threshold, but it is still missing some pixels in the center of the object. In many applications, computing the mean of a neighborhood (for each pixel!) whose size is of the order 140×140 may take too much time. In this case, the more complex Chow and Kanenko approach to adaptive thresholding would be more successful.
If your image processing package does not contain an adaptive threshold operator, you can simulate the effect with the following steps:

Convolve the image with a suitable statistical operator, i.e. the mean or median.
Subtract the original from the convolved image.
Threshold the difference image with C.
Invert the thresholded image.

Exercises

In the above example using

why does the mean produce a better result than the median? Can you think of any example where the median is more appropriate?
Think of an appropriate statistic for finding dark cracks on a light object using adaptive thresholding.
If you want to recover text from an image with a strong illumination gradient, how does the local thresholding method relate to the technique of removing the illumination gradient using pixel subtraction? Compare the results achieved with adaptive thresholding, pixel subtraction and pixel division.

References

E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 91 - 96.
R. Gonzales and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 443 - 452.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1986, p 408.

Contrast Stretching

Common Names: Contrast stretching, Normalization

Brief Description

Contrast stretching (often called normalization) is a simple image enhancement technique that attempts to improve the contrast in an image by `stretching' the range of intensity values it contains to span a desired range of values, e.g. the the full range of pixel values that the image type concerned allows. It differs from the more sophisticated histogram equalization in that it can only apply a linear scaling function to the image pixel values. As a result the `enhancement' is less harsh. (Most implementations accept a graylevel image as input and produce another graylevel image as output.)

How It Works

Before the stretching can be performed it is necessary to specify the upper and lower pixel value limits over which the image is to be normalized. Often these limits will just be the minimum and maximum pixel values that the image type concerned allows. For example for 8-bit graylevel images the lower and upper limits might be 0 and 255. Call the lower and the upper limits a and b respectively.
The simplest sort of normalization then scans the image to find the lowest and highest pixel values currently present in the image. Call these c and d. Then each pixel P is scaled using the following function:

The problem with this is that a single outlying pixel with either a very high or very low value can severely affect the value of c or d and this could lead to very unrepresentative scaling. Therefore a more robust approach is to first take a histogram of the image, and then select c and d at, say, the 5th and 95th percentile in the histogram (that is, 5% of the pixel in the histogram will have values lower than c, and 5% of the pixels will have values higher than d). This prevents outliers affecting the scaling so much.
Another common technique for dealing with outliers is to use the intensity histogram to find the most popular intensity level in an image (i.e. the histogram peak) and then define a cutoff fraction which is the minimum fraction of this peak magnitude below which data will be ignored. In other words, all intensity levels with histogram counts below this cutoff fraction will be discarded (driven to intensity value 0) and the remaining range of intensities will be expanded to fill out the full range of the image type under consideration.
Some implementations also work with color images. In this case all the channels will be stretched using the same offset and scaling in order to preserve the correct color ratios.

Guidelines for Use

Normalization is commonly used to improve the contrast in an image without distorting relative graylevel intensities too significantly.
We begin by considering an image

which can easily be enhanced by the most simple of contrast stretching implementations because the intensity histogram forms a tight, narrow cluster between the graylevel intensity values of 79 - 136, as shown in

After contrast stretching, using a simple linear interpolation between c = 79 and d = 136, we obtain

Compare the histogram of the original image with that of the contrast-stretched version

While this result is a significant improvement over the original, the enhanced image itself still appears somewhat flat. Histogram equalizing the image increases contrast dramatically, but yields an artificial-looking result

In this case, we can achieve better results by contrast stretching the image over a more narrow range of graylevel values from the original image. For example, by setting the cutoff fraction parameter to 0.03, we obtain the contrast-stretched image

and its corresponding histogram

Note that this operation has effectively spread out the information contained in the original histogram peak (thus improving contrast in the interesting face regions) by pushing those intensity levels to the left of the peak down the histogram x-axis towards 0. Setting the cutoff fraction to a high value, e.g. 0.8, yields the contrast stretched image

As shown in the histogram

most of the information to the left of the peak in the original image is mapped to 0 so that the peak can spread out even further and begin pushing values to its right up to 255.
As an example of an image which is more difficult to enhance, consider

which shows a low contrast image of a lunar surface.
The image

shows the intensity histogram of this image. Note that only part of the y-axis has been shown for clarity. The minimum and maximum values in this 8-bit image are 0 and 255 respectively, and so straightforward normalization to the range 0 - 255 produces absolutely no effect. However, we can enhance the picture by ignoring all pixel values outside the 1% and 99% percentiles, and only applying contrast stretching to those pixels in between. The outliers are simply forced to either 0 or 255 depending upon which side of the range they lie on.

shows the result of this enhancement. Notice that the contrast has been significantly improved. Compare this with the corresponding enhancement achieved using histogram equalization.
Normalization can also be used when converting from one image type to another, for instance from floating point pixel values to 8-bit integer pixel values. As an example the pixel values in the floating point image might run from 0 to 5000. Normalizing this range to 0-255 allows easy conversion to 8-bit integers. Obviously some information might be lost in the compression process, but the relative intensities of the pixels will be preserved.

Exercises

Derive the scaling formula given above from the parameters a, b, c and d.
Suppose you had to normalize an 8-bit image to one in which the pixel values were stored as 4-bit integers. What would be a suitable destination range (i.e. the values of a and b)?
Contrast-stretch the image

(You must begin by selecting suitable values for c and d.) Next, edge-detect (i.e. using the Sobel, Roberts Cross or Canny edge detector) both the original and the contrast stretched version. Does contrast stretching increase the number of edges which can be detected?
Imagine you have an image taken in low light levels and which, as a result, has low contrast. What are the advantages of using contrast stretching to improve the contrast, rather than simply scaling the image by a factor of, say, three?

References

E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 26 - 27, 79 - 99.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1989, Chap. 7, p 235.
D. Vernon Machine Vision, Prentice-Hall, 1991, p 45.

Histogram Equalization

Common Names: Histogram Modeling, Histogram Equalization

Brief Description

Histogram modeling techniques (e.g. histogram equalization) provide a sophisticated method for modifying the dynamic range and contrast of an image by altering that image such that its intensity histogram has a desired shape. Unlike contrast stretching, histogram modeling operators may employ non-linear and non-monotonic transfer functions to map between pixel intensity values in the input and output images. Histogram equalization employs a monotonic, non-linear mapping which re-assigns the intensity values of pixels in the input image such that the output image contains a uniform distribution of intensities (i.e. a flat histogram). This technique is used in image comparison processes (because it is effective in detail enhancement) and in the correction of non-linear effects introduced by, say, a digitizer or display system.

How It Works

Histogram modeling is usually introduced using continuous, rather than discrete, process functions. Therefore, we suppose that the images of interest contain continuous intensity levels (in the interval [0,1]) and that the transformation function f which maps an input image Eqn:eqnheq2

onto an output image Eqn:eqnheq3

is continuous within this interval. Further, it will be assumed that the transfer law (which may also be written in terms of intensity density levels, e.g. Eqn:eqnheq4

) is single-valued and monotonically increasing (as is the case in histogram equalization) so that it is possible to define the inverse law Eqn:eqnheq5

. An example of such a transfer function is illustrated in Figure 1.

Figure 1 A histogram transformation function.

All pixels in the input image with densities in the region Eqn:eqnheq6

will have their pixel values re-assigned such that they assume an output pixel density value in the range from Eqn:eqnheq8

. The surface areas Eqn:eqnheq10

and

will therefore be equal, yielding:

where

.
This result can be written in the language of probability theory if the histogram h is regarded as a continuous probability density function p describing the distribution of the (assumed random) intensity levels:

In the case of histogram equalization, the output probability densities should all be an equal fraction of the maximum number of intensity levels in the input image Eqn:eqnheq16

(where the minimum level considered is 0). The

transfer function (or point operator) necessary to achieve this result is simply:

Therefore,

where

is simply the cumulative probability distribution (i.e. cumulative histogram) of the original image. Thus, an image which is transformed using its cumulative histogram yields an output histogram which is flat!
A digital implementation of histogram equalization is usually performed by defining a transfer function of the form:

where N is the number of image pixels and Eqn:eqnheq22

is the number of pixels at intensity level k or less.
In the digital implementation, the output image will not necessarily be fully equalized and there may be `holes' in the histogram (i.e. unused intensity levels). These effects are likely to decrease as the number of pixels and intensity quantization levels in the input image are increased.

Guidelines for Use

To illustrate the utility of histogram equalization, consider

which shows an 8-bit grayscale image of the surface of the moon. The histogram

confirms what we can see by visual inspection: this image has poor dynamic range. (Note that we can view this histogram as a description of pixel probability densities by simply scaling the vertical axis by the total number of image pixels and normalizing the horizontal axis using the number of intensity density levels (i.e. 256). However, the shape of the distribution will be the same in either case.)
In order to improve the contrast of this image, without affecting the structure (i.e. geometry) of the information contained therein, we can apply the histogram equalization operator. The resulting image is

and its histogram is shown

Note that the histogram is not flat (as in the examples from the continuous case) but that the dynamic range and contrast have been enhanced. Note also that when equalizing images with narrow histograms and relatively few gray levels, increasing the dynamic range has the adverse effect of increasing visual grainyness. Compare this result with that produced by the linear contrast stretching operator

In order to further explore the transformation defined by the histogram equalization operator, consider the image of the Scott Monument in Edinburgh, Scotland

Although the contrast on the building is acceptable, the sky region is represented almost entirely by light pixels. This causes most histogram pixels

to be pushed into a narrow peak in the upper graylevel region. The histogram equalization operator defines a mapping based on the cumulative histogram

which results in the image

While histogram equalization has enhanced the contrast of the sky regions in the image, the picture now looks artificial because there is very little variety in the middle graylevel range. This occurs because the transfer function is based on the shallow slope of the cumulative histogram in the middle graylevel regions (i.e. intensity density levels 100 - 230) and causes many pixels from this region in the original image to be mapped to similar graylevels in the output image.
We can improve on this if we define a mapping based on a sub-section of the image which contains a better distribution of intensity densities from the low and middle range graylevels. If we crop the image so as to isolate a region which contains more building than sky

we can then define a histogram equalization mapping for the whole image based on the cumulative histogram

of this smaller region. Since the cropped image contains a more even distribution of dark and light pixels, the slope of the transfer function is steeper and smoother, and the contrast of the resulting image

is more natural. This idea of defining mappings based upon particular sub-sections of the image is taken up by another class of operators which perform Local Enhancements as discussed below.

Common Variants

Histogram Specification
Histogram equalization is limited in that it is capable of producing only one result: an image with a uniform intensity distribution. Sometimes it is desirable to be able to control the shape of the output histogram in order to highlight certain intensity levels in an image. This can be accomplished by the histogram specialization operator which maps a given intensity distribution Eqn:eqnheq23

into a desired distribution Eqn:eqnheq24

using a histogram equalized image Eqn:eqnheq25

as an intermediate stage.
The first step in histogram specialization, is to specify the desired output density function and write a transformation g(c). If Eqn:eqnheq27

is single-valued (which is true when there are no unfilled levels in the specified histogram or errors in the process of rounding off Eqn:eqnheq28

to the nearest intensity level), then Eqn:eqnheq29

defines a mapping from the equalized levels of the original image, Eqn:eqnheq30

. It is possible to combine these two transformations such that the image need not be histogram equalized explicitly:

Local Enhancements
The histogram processing methods discussed above are global in the sense that they apply a transformation function whose form is based on the intensity level distribution of an entire image. Although this method can enhance the overall contrast and dynamic range of an image (thereby making certain details more visible), there are cases in which enhancement of details over small areas (i.e. areas whose total pixel contribution to the total number of image pixels has a negligible influence on the global transform) is desired. The solution in these cases is to derive a transformation based upon the intensity distribution in the local neighborhood of every pixel in the image.
The histogram processes described above can be adapted for local enhancement. The procedure involves defining a neighborhood around each pixel and, using the histogram characteristics of this neighborhood, to derive a transfer function which maps that pixel into an output intensity level. This is performed for each pixel in the image. (Since moving across rows or down columns only adds one new pixel to the local histogram, updating the histogram from the previous calculation with new data introduced at each motion is possible.) Local enhancement may also define transforms based on pixel attributes other than histogram, e.g. intensity mean (to control variance) and variance (to control contrast) are common.

Exercises

Suppose that you have a 128×128 square pixel image with an 8 gray level intensity range, within which the lighter intensity levels predominate as shown in the table below. A) Sketch the histogram (number of pixels vs gray level) to describe this distribution. B) How many pixels/gray levels would there be in an equalized version of this histogram? C) Apply the discrete transformation described above and plot the new (equalized) histogram. (How well does the histogram approximate a uniform distribution of intensity values?)

 -------------------------------
| Gray Level | Number of Pixels |
|------------+------------------|
| 0          | 34               |
|------------+------------------|
| 1          | 50               |
|------------+------------------|
| 2          | 500              |
|------------+------------------|
| 3          | 1500             |
|------------+------------------|
| 4          | 2700             |
|------------+------------------|
| 5          | 4500             |
|------------+------------------|
| 6          | 4000             |
|------------+------------------|
| 7          | 3100             |
 -------------------------------

Suppose you have equalized an image once. Show that a second pass of histogram equalization will produce exactly the same result as the first.
Interpreting images derived by means of a non-monotonic or non-continuous mapping can be difficult. Describe the effects of the following transfer functions:
(a) f has a horizontal plateau,
(b) f contains a vertical jump,
(c) f has a negative slope.
(Hint: it can be useful to sketch the curve, as in Figure 1, and then map a few points from histogram A to histogram B.)
Apply local histogram equalization to the image

Compare this result with those derived by means of the global transfer function shown in the above examples.
Apply global and local histogram equalization to the montage image

Compare your results.

References

R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp 35 - 41.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 4.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1986, pp 241 - 243.
A. Marion An Introduction to Image Processing, Chapman and Hall, 1991, Chap. 6.

Logical AND/NAND

Brief Description

AND and NAND are examples of logical operators having the

truth-tables shown in Figure 1.

Figure 1 Truth-tables for AND and NAND.

As can be seen, the output values of NAND are simply the inverse of the corresponding output values of AND.
The AND (and similarly the NAND) operator typically takes two binary or integer graylevel images as input, and outputs a third image whose pixel values are just those of the first image, ANDed with the corresponding pixels from the second. A variation of this operator takes just a single input image and ANDs each pixel with a specified constant value in order to produce the output.

How It Works

The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the AND operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use

The most obvious application of AND is to compute the intersection of two images. We illustrate this with an example where we want to detect those objects in a scene which did not move between two images, i.e. which are at the same pixel positions in the first and the second image. We illustrate this example using

and

If we simply AND the two graylevel images in a bitwise fashion we obtain

Although we wanted the moved object to disappear from the resulting image, it appears twice, at its old and at its new position. The reason is that the object has rather low pixel values (similar to a logical 0) whereas the background has a high values (similar to a logical 1). However, we normally associate an object with logical 1 and the background with logical 0, therefore we actually ANDed the negatives of two images, which is equivalent to NOR them. To obtain the desired result we have to invert the images before ANDing them, as it was done in

Now, only the object which has the same position in both images is highlighted. However, ANDing two graylevel images might still cause problems, as it is not guaranteed that ANDing two high pixel values in a bitwise fashion yields a high output value (for example, 128 AND 127 yields 0). To avoid these problems, it is best to produce a binary versions from the grayscale images using thresholding.

and

are the thresholded versions of the above images and

is the result of ANDing their negatives.
Although ANDing worked well for the above example, it runs into problems in a scene like

Here, we have two objects with the average intensity of one being higher than the background and the other being lower. Hence, we can't produce a binary image containing both objects using simple thresholding. As can be seen in the following images, ANDing the grayscale images is not successful either. If in the second scene the light part was moved, as in

then the result of ANDing the two images is

It shows the desired effect of attenuating the moved object. However, if the second scene is somehow like

where the dark object was moved, we obtain

Here, the old and the new positions of the dark object are visible.
In general, applying the AND operator (or other logical operators) to two images in order to detect differences or similarities between them is most appropriate if they are binary or can be converted into binary format using thresholding.
As with other logical operators, AND and NAND are often used as sub-components of more complex image processing tasks. One of the common uses for AND is for masking. For example, suppose we wish to selectively brighten a small region of

to highlight a particular car. There are many ways of doing this and we illustrate just one. First a paint program is used to identify the region to be highlighted. In this case we set the region to black as shown in

This image can then be thresholded to just select the black region, producing the mask shown in

The mask image has a pixel value of 255 (11111111 binary) in the region that we are interested in, and zero pixels (00000000 binary) elsewhere. This mask is then bitwise ANDed with the original image to just select out the region that will be highlighted. This produces

Finally, we brighten this image by scaling it by a factor of 1.1, dim the original image using a scale factor of 0.8, and then add the two images together to produce

AND can also be used to perform so called

bit-slicing on an 8-bit image. To determine the influence of one particular bit on an image, it is ANDed in a bitwise fashion with a constant number, where the relevant bit is set to 1 and the remaining 7 bits are set to 0. For example, to obtain the

bit-plane 8 (corresponding to the most significant bit) of

we AND the image with 128 (10000000 binary) and threshold the output at a pixel value of 1. The result, shown in

is equivalent to thresholding the image at a value of 128. Images

and

correspond to bit-planes 7, 6 and 4. The images show that most image information is contained in the higher (more significant) bits, whereas the less significant bits contain some of the finer details and noise. The image

shows bit-plane 1.

Exercises

NAND

and

Compare the result with the result of ANDing the negatives of the two input images.
AND

and

as well as the negatives of

and

Compare the results with the ones obtained in the previous section.
Extract all 8 bit planes from

and

Comment on the number of visually significant bits in each image.
What would be the effect of ANDing an 8-bit graylevel image with a constant value of 240 (11110000 in binary)? Why might you want to do this?
What would be the effect of ANDing an 8-bit graylevel image with a constant value of 15 (00001111 in binary)? Why might you want to do this? Try this out on

and comment on what you see.

References

E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51, 171 - 172.
A. Jain Fundamentals of Digital Processing, Prentice Hall, 1989, pp 239 - 240.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Logical OR/NOR

Common Names: OR, NOR

Brief Description

OR and NOR are examples of logical operators having the

truth-tables shown in Figure 1.

Figure 1 Truth-tables for OR and NOR.

As can be seen, the output values of NOR are simply the inverses of the corresponding output values of OR.
The OR (and similarly the NOR) operator typically takes two binary or graylevel images as input, and outputs a third image whose pixel values are just those of the first image, ORed with the corresponding pixels from the second. A variation of this operator takes just a single input image and ORs each pixel with a specified constant value in order to produce the output.

How It Works

The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the OR operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use

We can illustrate the function of the OR operator using

and

The images show a scene with two objects, one of which was moved between the exposures. We can use OR to compute the

union of the images, i.e. highlighting all pixels which represent an object either in the first or in the second image. First, we threshold the images, since the process is simplified by use binary input. If we OR the resulting images

and

we obtain

This image shows only the position of the object which was at the same location in both input images. The reason is that the objects are represented with logically 0 and the background is logically 1. Hence, we actually OR the background which is equivalent to NANDing the objects. To get the desired result, we first have to invert the input images before ORing them. Then, we obtain

Now, the output shows the position of the stationary object as well as that of the moved object.
As with other logical operators, OR and NOR are often used as sub-components of more complex image processing tasks. OR is often used to merge two images together. Suppose we want to overlay

with its histogram, shown in

First, an image editor is used to enlarge the histogram image until it is the same size as the grayscale image as shown in

Then, simply ORing the two gives

The performance in this example is quite good, because the images contain very distinct graylevels. If we proceed in the same way with

we obtain

Now, it is difficult to see the characters of the histogram (which have high pixel values) at places where the original image has high values, as well. Compare the result with that described under XOR.
Note that there is no problem of overflowing pixel values with the OR operator, as there is with the addition operator.
ORing is usually safest when at least one of the images is binary, i.e. the pixel values are 0000... and 1111... only. The problem with ORing other combinations of integers is that the output result can fluctuate wildly with a small change in input values. For instance 127 ORed with 128 gives 255, whereas 127 ORed with 126 gives 127.

Exercises

NOR

and

and AND their negatives. Compare the results.
Why can't you use thresholding to produce a binary image containing both objects of

and

? Use graylevel ORing to combine the two images. Can you detect all the locations of the objects in the two images? What changes if you invert the images before combining them.
In the example above, how could you make the histogram appear in black instead of white? Try it.
Summarize the conditions under which you would use OR to combine two images rather than, say, addition or blending.

References

R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51, 171 - 172.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Logical XOR/XNOR

Common Names: XOR, XNOR, EOR, ENOR

Brief Description

XOR and XNOR are examples of logical operators having the

truth-tables shown in Figure 1.

Figure 1 Truth-tables for XOR and XNOR.

The XOR function is only true if just one (and only one) of the input values is true, and false otherwise. XOR stands for eXclusive OR. As can be seen, the output values of XNOR are simply the inverse of the corresponding output values of XOR.
The XOR (and similarly the XNOR) operator typically takes two binary or graylevel images as input, and outputs a third image whose pixel values are just those of the first image, XORed with the corresponding pixels from the second. A variation of this operator takes a single input image and XORs each pixel with a specified constant value in order to produce the output.

How It Works

The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them, or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the XOR operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use

We illustrate the function of XOR using

and

Since logical operators work more reliably with binary input we first threshold the two images, thus obtaining

and

Now, we can use XOR to

detect changes in the images, since pixels which didn't change output 0 and pixels which did change result in 1. The image

shows the result of XORing the thresholded images. We can see the old and the new position of the moved object, whereas the stationary object almost disappeared from the image. Due to the effects of noise, we can still see some pixels around the boundary of the stationary object, i.e. pixels whose values in the original image were close to the threshold.
In a scene like

it is not possible to apply a threshold in order to obtain a binary image, since one of the objects is lighter than the background whereas the other one is darker. However, we can combine two grayscale images by XORing them in a bitwise fashion.

shows a scene where the dark object was moved and in

the light object changed its position. XORing each of them with the initial image yields

and

respectively. In both cases, the moved part appears at the old as well as at the new location and the stationary object almost disappears. This technique is based on the assumption that XORing two similar grayvalues produces a low output, whereas two distinct inputs yield a high output. However, this is not always true, e.g. XORing 127 and 128 yields 255. These effects can be seen at the boundary of the stationary object, where the pixels have an intermediate graylevel and might, due to noise, differ slightly between two of the images. Hence, we can see a line with high values around the stationary object. A similar problem is that the output for the moved pen is much higher than the output for the moved piece of paper, although the contrast between their intensities and that of the background value is roughly the same. Because of these problems it is often better to use image subtraction or image division for change detection.
As with other logical operators, XOR and XNOR are often used as sub-components of more complex image processing tasks. XOR has the interesting property that if we XOR A with B to get Q, then the bits of Q are the same as A where the corresponding bit from B is zero, but they are of the opposite value where the corresponding bit from B is one. So for instance using binary notation, 1010 XORed with 1100 gives 0110. For this reason, B could be thought of as a

bit-reversal mask. Since the operator is symmetric, we could just as well have treated A as the mask and B as the original.
Extending this idea to images, it is common to see an 8-bit XOR image mask containing only the pixel values 0 (00000000 binary) and 255 (11111111 binary). When this is XORed pixel-by-pixel with an original image it reverses the bits of pixels values where the mask is 255, and leaves them as they are where the mask is zero. The pixels with reversed bits normally `stand out' against their original color and so this technique is often used to produce a cursor that is visible against an arbitrary colored background. The other advantage of using XOR like this is that to undo the process (for instance when the cursor moves away), it is only necessary to repeat the XOR using the same mask and all the flipped pixels will become unflipped. Therefore it is not necessary to explicitly store the original colors of the pixels affected by the mask. Note that the flipped pixels are not always visible against their unflipped color --- light pixels become dark pixels and dark pixels become light pixels, but middling gray pixels become middling gray pixels!
The image

shows a simple graylevel image. Suppose that we wish to overlay this image with its histogram shown in

so that the two can be compared easily. One way is to use XOR. We first use an image editor to enlarge the histogram until it is the same size as the first image. The result is shown in

To perform the overlay we simply XOR this image with the first image in bitwise fashion to produce

Here, the text is quite easy to read, because the original image consists of large and rather light or rather dark areas. If we proceed in the same way with

we obtain

Note how the writing is dark against light backgrounds and light against dark backgrounds and hardly visible against gray backgrounds. Compare the result with that described under OR. In fact XORing is not particularly good for producing easy to read text on gray backgrounds --- we might do better just to add a constant offset to the image pixels that we wish to highlight (assuming wraparound under addition overflow) --- but it is often used to quickly produce highlighted pixels where the background is just black and white or where legibility is not too important.

Exercises

XOR

and

Compare the result with the output of XORing their negatives. Do you see the same effect as for other logical operators?
Use the technique discussed above to produce a cursor on

Place the cursor on different location of the image and examine the performance on a background with high, low, intermediate and mixed pixel values.

References

R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Thresholding

Common Names: Threshold, Density slicing

Brief Description

In many vision applications, it is useful to be able to separate out the regions of the image corresponding to objects in which we are interested, from the regions of the image that correspond to background. Thresholding often provides an easy and convenient way to perform this

segmentation on the basis of the different intensities or colors in the foreground and background regions of an image.
In addition, it is often useful to be able to see what areas of an image consist of pixels whose values lie within a specified range, or band of intensities (or colors). Thresholding can be used for this as well.

How It Works

The input to a thresholding operation is typically a grayscale or color image. In the simplest implementation, the output is a binary image representing the segmentation. Black pixels correspond to background and white pixels correspond to foreground (or vice versa). In simple implementations, the segmentation is determined by a single parameter known as the intensity threshold. In a single pass, each pixel in the image is compared with this threshold. If the pixel's intensity is higher than the threshold, the pixel is set to, say, white in the output. If it is less than the threshold, it is set to black.
In more sophisticated implementations, multiple thresholds can be specified, so that a band of intensity values can be set to white while everything else is set to black. For color or multi-spectral images, it may be possible to set different thresholds for each color channel, and so select just those pixels within a specified cuboid in RGB space. Another common variant is to set to black all those pixels corresponding to background, but leave foreground pixels at their original color/intensity (as opposed to forcing them to white), so that that information is not lost.

Guidelines for Use

Not all images can be neatly segmented into foreground and background using simple thresholding. Whether or not an image can be correctly segmented this way can be determined by looking at an intensity histogram of the image. We will consider just a grayscale histogram here, but the extension to color is trivial.
If it is possible to separate out the foreground of an image on the basis of pixel intensity, then the intensity of pixels within foreground objects must be distinctly different from the intensity of pixels within the background. In this case, we expect to see a distinct peak in the histogram corresponding to foreground objects such that thresholds can be chosen to isolate this peak accordingly. If such a peak does not exist, then it is unlikely that simple thresholding will produce a good segmentation. In this case, adaptive thresholding may be a better answer.
Figure 1 shows some typical histograms along with suitable choices of threshold.

Figure 1 A) shows a classic bi-modal intensity distribution. This image can be successfully segmented using a single threshold T1. B) is slightly more complicated. Here we suppose the central peak represents the objects we are interested in and so threshold segmentation requires two thresholds: T1 and T2. In C), the two peaks of a bi-modal distribution have run together and so it is almost certainly not possible to successfully segment this image using a single global threshold

The histogram for image

This shows a nice bi-modal distribution --- the lower peak represents the object and the higher one represents the background. The picture can be segmented using a single threshold at a pixel intensity value of 120. The result is shown in

The histogram for image

Due to the severe illumination gradient across the scene, the peaks corresponding to foreground and background have run together and so simple thresholding does not give good results. Images

and

show the resulting bad segmentations for single threshold values of 80 and 120 respectively (reasonable results can be achieved by using adaptive thresholding on this image).
Thresholding is also used to filter the output of or input to other operators. For instance, in the former case, an edge detector like Sobel will highlight regions of the image that have high spatial gradients. If we are only interested in gradients above a certain value (i.e. sharp edges), then thresholding can be used to just select the strongest edges and set everything else to black. As an example,

was obtained by first applying the Sobel operator to

to produce

and then thresholding this using a threshold value of 60.
Thresholding can be used as preprocessing to extract an interesting subset of image structures which will then be passed along to another operator in an image processing chain. For example, image

shows a slice of brain tissue containing nervous cells (i.e. the large gray blobs, with darker circular nuclei in the middle) and glia cells (i.e. the isolated, small, black circles). We can threshold this image so as to map all pixel values between 0 and 150 in the original image to foreground (i.e. 255) values in the binary image, and leave the rest to go to background, as in

The resultant image can then be connected-components-labeled in order to count the total number of cells in the original image, as in

If we wanted to know how many nerve cells there are in the original image, we might try applying a double threshold in order to select out just the pixels which correspond to nerve cells (and therefore have middle level grayscale intensities) in the original image. (In remote sensing and medical terminology, such thresholding is usually called

density slicing.) Applying a threshold band of 130 - 150 yields

While most of the foreground of the resulting image corresponds to nerve cells, the foreground features are so disconnected (because nerve cell nuclei map to background intensity values along with the glia cells) that we cannot apply connected components labeling. Alternatively, we might obtain a better assessment of the number of nerve cells by investigating some attributes (e.g. size, as measured by a distance transform) of the binary image containing both whole nerve cells and glia. In reality, sophisticated modeling and/or pattern matching is required to segment such an image.

Exercises

How would you set up the lighting for a simple scene containing just flat metal parts viewed from above so as to ensure the best possible segmentation using simple thresholding?
In medical imagery of certain mouse nervous tissue, healthy cells assume a medium graylevel intensity, while dead cells become dense and black. The images

and

were each taken on a different day during an experiment which sought to quantify cell death. Investigate the intensity histogram of these images and choose a threshold which allows you to segment out the dead cells. Then use connected components labeling to count the number of dead cells on each day of the experiment.
Thresholding is often used in applications such as remote sensing where it is desirable to select out, from an image, those regions whose pixels lie within a specified range of pixel values. For instance, it might be known that wheat fields give rise to a particular range of intensities (in some spectral band) that is fairly unusual elsewhere. In the multi-spectral image

assume that wheat fields are visible as yellow patches. Construct a set of thresholds for each color channel which allow you to segment out the wheat fields (note, you may need to reset your display).
How should the intensity threshold be chosen so that a small change in this threshold value causes as little change as possible to the resulting segmentation? Think about what the intensity histogram must look like at the threshold value.
Discuss whether you expect thresholding to be of much use in segmenting natural scenes.

References

E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 4.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 7.
D. Vernon Machine Vision, Prentice-Hall, 1991, pp 49 - 51, 86 - 89.