Adaptive Thresholding


Brief Description


Thresholding is used to segment an image by setting all pixels whose intensity values are above a threshold to a foreground value and all the remaining pixels to a background value.
Whereas the conventional thresholding operator uses a global threshold for all pixels, adaptive thresholding changes the threshold dynamically over the image. This more sophisticated version of thresholding can accommodate changing lighting conditions in the image, e.g. those occurring as a result of a strong illumination gradient or shadows.

How It Works


Adaptive thresholding typically takes a grayscale or color image as input and, in the simplest implementation, outputs a binary image representing the segmentation. For each pixel in the image, a threshold has to be calculated. If the pixel value is below the threshold it is set to the background value, otherwise it assumes the foreground value.
There are two main approaches to finding the threshold: (i) the Chow and Kanenko approach and (ii) local thresholding. The assumption behind both methods is that smaller image regions are more likely to have approximately uniform illumination, thus being more suitable for thresholding. Chow and Kanenko divide an image into an array of overlapping subimages and then find the optimum threshold for each subimage by investigating its histogram. The threshold for each single pixel is found by interpolating the results of the subimages. The drawback of this method is that it is computational expensive and, therefore, is not appropriate for real-time applications.
An alternative approach to finding the local threshold is to statistically examine the intensity values of the local neighborhood of each pixel. The statistic which is most appropriate depends largely on the input image. Simple and fast functions include the mean of the local intensity distribution,
Eqn:eqnadp1

the median value,
Eqn:eqnadp2

or the mean of the minimum and maximum values,
Eqn:eqnadp3

The size of the neighborhood has to be large enough to cover sufficient foreground and background pixels, otherwise a poor threshold is chosen. On the other hand, choosing regions which are too large can violate the assumption of approximately uniform illumination. This method is less computationally intensive than the Chow and Kanenko approach and produces good results for some applications.

Guidelines for Use


Like global thresholding, adaptive thresholding is used to separate desirable foreground image objects from the background based on the difference in pixel intensities of each region. Global thresholding uses a fixed threshold for all pixels in the image and therefore works only if the intensity histogram of the input image contains neatly separated peaks corresponding to the desired subject(s) and background(s). Hence, it cannot deal with images containing, for example, a strong illumination gradient.
Local adaptive thresholding, on the other hand, selects an individual threshold for each pixel based on the range of intensity values in its local neighborhood. This allows for thresholding of an image whose global intensity histogram doesn't contain distinctive peaks.
A task well suited to local adaptive thresholding is in segmenting text from the image
son1


Because this image contains a strong illumination gradient, global thresholding produces a very poor result, as can be seen in
son1thr1



Using the mean of a 7×7 neighborhood, adaptive thresholding yields
son1adp1


The method succeeds in the area surrounding the text because there are enough foreground and background pixels in the local neighborhood of each pixel; i.e. the mean value lies between the intensity values of foreground and background and, therefore, separates easily. On the margin, however, the mean of the local area is not suitable as a threshold, because the range of intensity values within a local neighborhood is very small and their mean is close to the value of the center pixel.
The situation can be improved if the threshold employed is not the mean, but (mean-C), where C is a constant. Using this statistic, all pixels which exist in a uniform neighborhood (e.g. along the margins) are set to background. The result for a 7×7 neighborhood and C=7 is shown in
son1adp2


and for a 75×75 neighborhood and C=10 in
son1adp3


The larger window yields the poorer result, because it is more adversely affected by the illumination gradient. Also note that the latter is more computationally intensive than thresholding using the smaller window.
The result of using the median instead of the mean can be seen in
son1adp4


(The neighborhood size for this example is 7×7 and C = 4). The result shows that, in this application, the median is a less suitable statistic than the mean.
Consider another example image containing a strong illumination gradient
wdg3


This image can not be segmented with a global threshold, as shown in
wdg3thr1


where a threshold of 80 was used. However, since the image contains a large object, it is hard to apply adaptive thresholding, as well. Using the (mean - C) as a local threshold, we obtain
wdg3adp1


with a 7×7 window and C = 4, and
wdg3adp2


with a 140×140 window and C = 8. All pixels which belong to the object but do not have any background pixels in their neighborhood are set to background. The latter image shows a much better result than that achieved with a global threshold, but it is still missing some pixels in the center of the object. In many applications, computing the mean of a neighborhood (for each pixel!) whose size is of the order 140×140 may take too much time. In this case, the more complex Chow and Kanenko approach to adaptive thresholding would be more successful.
If your image processing package does not contain an adaptive threshold operator, you can simulate the effect with the following steps:
  1. Convolve the image with a suitable statistical operator, i.e. the mean or median.
  2. Subtract the original from the convolved image.
  3. Threshold the difference image with C.
  4. Invert the thresholded image.

Exercises


  1. In the above example using
    son1


    why does the mean produce a better result than the median? Can you think of any example where the median is more appropriate?
  2. Think of an appropriate statistic for finding dark cracks on a light object using adaptive thresholding.
  3. If you want to recover text from an image with a strong illumination gradient, how does the local thresholding method relate to the technique of removing the illumination gradient using pixel subtraction? Compare the results achieved with adaptive thresholding, pixel subtraction and pixel division.

References


E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 91 - 96.
R. Gonzales and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 443 - 452.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1986, p 408.

Contrast Stretching



Common Names: Contrast stretching, Normalization

Brief Description


Contrast stretching (often called normalization) is a simple image enhancement technique that attempts to improve the contrast in an image by `stretching' the range of intensity values it contains to span a desired range of values, e.g. the the full range of pixel values that the image type concerned allows. It differs from the more sophisticated histogram equalization in that it can only apply a linear scaling function to the image pixel values. As a result the `enhancement' is less harsh. (Most implementations accept a graylevel image as input and produce another graylevel image as output.)

How It Works


Before the stretching can be performed it is necessary to specify the upper and lower pixel value limits over which the image is to be normalized. Often these limits will just be the minimum and maximum pixel values that the image type concerned allows. For example for 8-bit graylevel images the lower and upper limits might be 0 and 255. Call the lower and the upper limits a and b respectively.
The simplest sort of normalization then scans the image to find the lowest and highest pixel values currently present in the image. Call these c and d. Then each pixel P is scaled using the following function:
Eqn:eqnstr1

The problem with this is that a single outlying pixel with either a very high or very low value can severely affect the value of c or d and this could lead to very unrepresentative scaling. Therefore a more robust approach is to first take a histogram of the image, and then select c and d at, say, the 5th and 95th percentile in the histogram (that is, 5% of the pixel in the histogram will have values lower than c, and 5% of the pixels will have values higher than d). This prevents outliers affecting the scaling so much.
Another common technique for dealing with outliers is to use the intensity histogram to find the most popular intensity level in an image (i.e. the histogram peak) and then define a cutoff fraction which is the minimum fraction of this peak magnitude below which data will be ignored. In other words, all intensity levels with histogram counts below this cutoff fraction will be discarded (driven to intensity value 0) and the remaining range of intensities will be expanded to fill out the full range of the image type under consideration.
Some implementations also work with color images. In this case all the channels will be stretched using the same offset and scaling in order to preserve the correct color ratios.

Guidelines for Use


Normalization is commonly used to improve the contrast in an image without distorting relative graylevel intensities too significantly.
We begin by considering an image
wom1


which can easily be enhanced by the most simple of contrast stretching implementations because the intensity histogram forms a tight, narrow cluster between the graylevel intensity values of 79 - 136, as shown in
wom1hst1


After contrast stretching, using a simple linear interpolation between c = 79 and d = 136, we obtain
wom1str1


Compare the histogram of the original image with that of the contrast-stretched version
wom1hst2



While this result is a significant improvement over the original, the enhanced image itself still appears somewhat flat. Histogram equalizing the image increases contrast dramatically, but yields an artificial-looking result
wom1heq1


In this case, we can achieve better results by contrast stretching the image over a more narrow range of graylevel values from the original image. For example, by setting the cutoff fraction parameter to 0.03, we obtain the contrast-stretched image
wom1str2


and its corresponding histogram
wom1hst3


Note that this operation has effectively spread out the information contained in the original histogram peak (thus improving contrast in the interesting face regions) by pushing those intensity levels to the left of the peak down the histogram x-axis towards 0. Setting the cutoff fraction to a high value, e.g. 0.8, yields the contrast stretched image
wom1str3


As shown in the histogram
wom1hst4


most of the information to the left of the peak in the original image is mapped to 0 so that the peak can spread out even further and begin pushing values to its right up to 255.
As an example of an image which is more difficult to enhance, consider
moo2


which shows a low contrast image of a lunar surface.
The image
moo2hst2


shows the intensity histogram of this image. Note that only part of the y-axis has been shown for clarity. The minimum and maximum values in this 8-bit image are 0 and 255 respectively, and so straightforward normalization to the range 0 - 255 produces absolutely no effect. However, we can enhance the picture by ignoring all pixel values outside the 1% and 99% percentiles, and only applying contrast stretching to those pixels in between. The outliers are simply forced to either 0 or 255 depending upon which side of the range they lie on.

moo2str1


shows the result of this enhancement. Notice that the contrast has been significantly improved. Compare this with the corresponding enhancement achieved using histogram equalization.
Normalization can also be used when converting from one image type to another, for instance from floating point pixel values to 8-bit integer pixel values. As an example the pixel values in the floating point image might run from 0 to 5000. Normalizing this range to 0-255 allows easy conversion to 8-bit integers. Obviously some information might be lost in the compression process, but the relative intensities of the pixels will be preserved.

Exercises



  1. Derive the scaling formula given above from the parameters a, b, c and d.
  2. Suppose you had to normalize an 8-bit image to one in which the pixel values were stored as 4-bit integers. What would be a suitable destination range (i.e. the values of a and b)?
  3. Contrast-stretch the image
    sap1


    (You must begin by selecting suitable values for c and d.) Next, edge-detect (i.e. using the Sobel, Roberts Cross or Canny edge detector) both the original and the contrast stretched version. Does contrast stretching increase the number of edges which can be detected?
  4. Imagine you have an image taken in low light levels and which, as a result, has low contrast. What are the advantages of using contrast stretching to improve the contrast, rather than simply scaling the image by a factor of, say, three?

References


E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 26 - 27, 79 - 99.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1989, Chap. 7, p 235.
D. Vernon Machine Vision, Prentice-Hall, 1991, p 45.

Histogram Equalization



Common Names: Histogram Modeling, Histogram Equalization

Brief Description


Histogram modeling techniques (e.g. histogram equalization) provide a sophisticated method for modifying the dynamic range and contrast of an image by altering that image such that its intensity histogram has a desired shape. Unlike contrast stretching, histogram modeling operators may employ non-linear and non-monotonic transfer functions to map between pixel intensity values in the input and output images. Histogram equalization employs a monotonic, non-linear mapping which re-assigns the intensity values of pixels in the input image such that the output image contains a uniform distribution of intensities (i.e. a flat histogram). This technique is used in image comparison processes (because it is effective in detail enhancement) and in the correction of non-linear effects introduced by, say, a digitizer or display system.

How It Works


Histogram modeling is usually introduced using continuous, rather than discrete, process functions. Therefore, we suppose that the images of interest contain continuous intensity levels (in the interval [0,1]) and that the transformation function f which maps an input image Eqn:eqnheq2 onto an output image Eqn:eqnheq3 is continuous within this interval. Further, it will be assumed that the transfer law (which may also be written in terms of intensity density levels, e.g. Eqn:eqnheq4) is single-valued and monotonically increasing (as is the case in histogram equalization) so that it is possible to define the inverse law Eqn:eqnheq5. An example of such a transfer function is illustrated in Figure 1.



Figure 1 A histogram transformation function.


All pixels in the input image with densities in the region Eqn:eqnheq6 to Eqn:eqnheq7 will have their pixel values re-assigned such that they assume an output pixel density value in the range from Eqn:eqnheq8 to Eqn:eqnheq9. The surface areas Eqn:eqnheq10 and Eqn:eqnheq11 will therefore be equal, yielding:
Eqn:eqnheq12

where Eqn:eqnheq12a.
This result can be written in the language of probability theory if the histogram h is regarded as a continuous probability density function p describing the distribution of the (assumed random) intensity levels:
Eqn:eqnheq15

In the case of histogram equalization, the output probability densities should all be an equal fraction of the maximum number of intensity levels in the input image Eqn:eqnheq16 (where the minimum level considered is 0). The transfer function (or point operator) necessary to achieve this result is simply:
Eqn:eqnheq17

Therefore,
Eqn:eqnheq18

where Eqn:eqnheq19 is simply the cumulative probability distribution (i.e. cumulative histogram) of the original image. Thus, an image which is transformed using its cumulative histogram yields an output histogram which is flat!
A digital implementation of histogram equalization is usually performed by defining a transfer function of the form:
Eqn:eqnheq20

where N is the number of image pixels and Eqn:eqnheq22 is the number of pixels at intensity level k or less.
In the digital implementation, the output image will not necessarily be fully equalized and there may be `holes' in the histogram (i.e. unused intensity levels). These effects are likely to decrease as the number of pixels and intensity quantization levels in the input image are increased.

Guidelines for Use


To illustrate the utility of histogram equalization, consider
moo2


which shows an 8-bit grayscale image of the surface of the moon. The histogram
moo2hst2


confirms what we can see by visual inspection: this image has poor dynamic range. (Note that we can view this histogram as a description of pixel probability densities by simply scaling the vertical axis by the total number of image pixels and normalizing the horizontal axis using the number of intensity density levels (i.e. 256). However, the shape of the distribution will be the same in either case.)
In order to improve the contrast of this image, without affecting the structure (i.e. geometry) of the information contained therein, we can apply the histogram equalization operator. The resulting image is
moo2heq1


and its histogram is shown
moo2hst1


Note that the histogram is not flat (as in the examples from the continuous case) but that the dynamic range and contrast have been enhanced. Note also that when equalizing images with narrow histograms and relatively few gray levels, increasing the dynamic range has the adverse effect of increasing visual grainyness. Compare this result with that produced by the linear contrast stretching operator
moo2str1



In order to further explore the transformation defined by the histogram equalization operator, consider the image of the Scott Monument in Edinburgh, Scotland
bld1


Although the contrast on the building is acceptable, the sky region is represented almost entirely by light pixels. This causes most histogram pixels
bld1hst1


to be pushed into a narrow peak in the upper graylevel region. The histogram equalization operator defines a mapping based on the cumulative histogram
bld1cuh1


which results in the image
bld1heq1


While histogram equalization has enhanced the contrast of the sky regions in the image, the picture now looks artificial because there is very little variety in the middle graylevel range. This occurs because the transfer function is based on the shallow slope of the cumulative histogram in the middle graylevel regions (i.e. intensity density levels 100 - 230) and causes many pixels from this region in the original image to be mapped to similar graylevels in the output image.
We can improve on this if we define a mapping based on a sub-section of the image which contains a better distribution of intensity densities from the low and middle range graylevels. If we crop the image so as to isolate a region which contains more building than sky
bld1crp1


we can then define a histogram equalization mapping for the whole image based on the cumulative histogram
bld1cuh2


of this smaller region. Since the cropped image contains a more even distribution of dark and light pixels, the slope of the transfer function is steeper and smoother, and the contrast of the resulting image
bld1heq2


is more natural. This idea of defining mappings based upon particular sub-sections of the image is taken up by another class of operators which perform Local Enhancements as discussed below.

Common Variants


Histogram Specification
Histogram equalization is limited in that it is capable of producing only one result: an image with a uniform intensity distribution. Sometimes it is desirable to be able to control the shape of the output histogram in order to highlight certain intensity levels in an image. This can be accomplished by the histogram specialization operator which maps a given intensity distribution Eqn:eqnheq23 into a desired distribution Eqn:eqnheq24 using a histogram equalized image Eqn:eqnheq25 as an intermediate stage.
The first step in histogram specialization, is to specify the desired output density function and write a transformation g(c). If Eqn:eqnheq27 is single-valued (which is true when there are no unfilled levels in the specified histogram or errors in the process of rounding off Eqn:eqnheq28 to the nearest intensity level), then Eqn:eqnheq29 defines a mapping from the equalized levels of the original image, Eqn:eqnheq30. It is possible to combine these two transformations such that the image need not be histogram equalized explicitly:
Eqn:eqnheq31

Local Enhancements
The histogram processing methods discussed above are global in the sense that they apply a transformation function whose form is based on the intensity level distribution of an entire image. Although this method can enhance the overall contrast and dynamic range of an image (thereby making certain details more visible), there are cases in which enhancement of details over small areas (i.e. areas whose total pixel contribution to the total number of image pixels has a negligible influence on the global transform) is desired. The solution in these cases is to derive a transformation based upon the intensity distribution in the local neighborhood of every pixel in the image.
The histogram processes described above can be adapted for local enhancement. The procedure involves defining a neighborhood around each pixel and, using the histogram characteristics of this neighborhood, to derive a transfer function which maps that pixel into an output intensity level. This is performed for each pixel in the image. (Since moving across rows or down columns only adds one new pixel to the local histogram, updating the histogram from the previous calculation with new data introduced at each motion is possible.) Local enhancement may also define transforms based on pixel attributes other than histogram, e.g. intensity mean (to control variance) and variance (to control contrast) are common.

Exercises


  1. Suppose that you have a 128×128 square pixel image with an 8 gray level intensity range, within which the lighter intensity levels predominate as shown in the table below. A) Sketch the histogram (number of pixels vs gray level) to describe this distribution. B) How many pixels/gray levels would there be in an equalized version of this histogram? C) Apply the discrete transformation described above and plot the new (equalized) histogram. (How well does the histogram approximate a uniform distribution of intensity values?)
     -------------------------------
    | Gray Level | Number of Pixels |
    |------------+------------------|
    | 0          | 34               |
    |------------+------------------|
    | 1          | 50               |
    |------------+------------------|
    | 2          | 500              |
    |------------+------------------|
    | 3          | 1500             |
    |------------+------------------|
    | 4          | 2700             |
    |------------+------------------|
    | 5          | 4500             |
    |------------+------------------|
    | 6          | 4000             |
    |------------+------------------|
    | 7          | 3100             |
     -------------------------------

  2. Suppose you have equalized an image once. Show that a second pass of histogram equalization will produce exactly the same result as the first.
  3. Interpreting images derived by means of a non-monotonic or non-continuous mapping can be difficult. Describe the effects of the following transfer functions:
    (a) f has a horizontal plateau,
    (b) f contains a vertical jump,
    (c) f has a negative slope.
    (Hint: it can be useful to sketch the curve, as in Figure 1, and then map a few points from histogram A to histogram B.)
  4. Apply local histogram equalization to the image
    bld1


    Compare this result with those derived by means of the global transfer function shown in the above examples.
  5. Apply global and local histogram equalization to the montage image
    soi1


    Compare your results.

References


R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp 35 - 41.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 4.
A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1986, pp 241 - 243.
A. Marion An Introduction to Image Processing, Chapman and Hall, 1991, Chap. 6.

Logical AND/NAND

Brief Description


AND and NAND are examples of logical operators having the truth-tables shown in Figure 1.



Figure 1 Truth-tables for AND and NAND.


As can be seen, the output values of NAND are simply the inverse of the corresponding output values of AND.
The AND (and similarly the NAND) operator typically takes two binary or integer graylevel images as input, and outputs a third image whose pixel values are just those of the first image, ANDed with the corresponding pixels from the second. A variation of this operator takes just a single input image and ANDs each pixel with a specified constant value in order to produce the output.

How It Works


The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the AND operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use


The most obvious application of AND is to compute the intersection of two images. We illustrate this with an example where we want to detect those objects in a scene which did not move between two images, i.e. which are at the same pixel positions in the first and the second image. We illustrate this example using
scr3


and
scr4


If we simply AND the two graylevel images in a bitwise fashion we obtain
scr3and1


Although we wanted the moved object to disappear from the resulting image, it appears twice, at its old and at its new position. The reason is that the object has rather low pixel values (similar to a logical 0) whereas the background has a high values (similar to a logical 1). However, we normally associate an object with logical 1 and the background with logical 0, therefore we actually ANDed the negatives of two images, which is equivalent to NOR them. To obtain the desired result we have to invert the images before ANDing them, as it was done in
scr3and2


Now, only the object which has the same position in both images is highlighted. However, ANDing two graylevel images might still cause problems, as it is not guaranteed that ANDing two high pixel values in a bitwise fashion yields a high output value (for example, 128 AND 127 yields 0). To avoid these problems, it is best to produce a binary versions from the grayscale images using thresholding.
scr3thr1


and
scr4thr1


are the thresholded versions of the above images and
scr3and3


is the result of ANDing their negatives.
Although ANDing worked well for the above example, it runs into problems in a scene like
pap1


Here, we have two objects with the average intensity of one being higher than the background and the other being lower. Hence, we can't produce a binary image containing both objects using simple thresholding. As can be seen in the following images, ANDing the grayscale images is not successful either. If in the second scene the light part was moved, as in
pap2


then the result of ANDing the two images is
pap1and1


It shows the desired effect of attenuating the moved object. However, if the second scene is somehow like
pap3


where the dark object was moved, we obtain
pap1and2


Here, the old and the new positions of the dark object are visible.
In general, applying the AND operator (or other logical operators) to two images in order to detect differences or similarities between them is most appropriate if they are binary or can be converted into binary format using thresholding.
As with other logical operators, AND and NAND are often used as sub-components of more complex image processing tasks. One of the common uses for AND is for masking. For example, suppose we wish to selectively brighten a small region of
car1


to highlight a particular car. There are many ways of doing this and we illustrate just one. First a paint program is used to identify the region to be highlighted. In this case we set the region to black as shown in
car1msk1


This image can then be thresholded to just select the black region, producing the mask shown in
car1thr1


The mask image has a pixel value of 255 (11111111 binary) in the region that we are interested in, and zero pixels (00000000 binary) elsewhere. This mask is then bitwise ANDed with the original image to just select out the region that will be highlighted. This produces
car1and1


Finally, we brighten this image by scaling it by a factor of 1.1, dim the original image using a scale factor of 0.8, and then add the two images together to produce
car1add1



AND can also be used to perform so called bit-slicing on an 8-bit image. To determine the influence of one particular bit on an image, it is ANDed in a bitwise fashion with a constant number, where the relevant bit is set to 1 and the remaining 7 bits are set to 0. For example, to obtain the bit-plane 8 (corresponding to the most significant bit) of
ape1


we AND the image with 128 (10000000 binary) and threshold the output at a pixel value of 1. The result, shown in
ape1and8


is equivalent to thresholding the image at a value of 128. Images
ape1and7



ape1and6


and
ape1and4


correspond to bit-planes 7, 6 and 4. The images show that most image information is contained in the higher (more significant) bits, whereas the less significant bits contain some of the finer details and noise. The image
ape1and1


shows bit-plane 1.

Exercises



  1. NAND
    cir2


    and
    cir3


    Compare the result with the result of ANDing the negatives of the two input images.
  2. AND
    scr3thr1


    and
    scr4thr1


    as well as the negatives of
    pap1


    and
    pap2


    Compare the results with the ones obtained in the previous section.
  3. Extract all 8 bit planes from
    pen1


    and
    str1


    Comment on the number of visually significant bits in each image.
  4. What would be the effect of ANDing an 8-bit graylevel image with a constant value of 240 (11110000 in binary)? Why might you want to do this?
  5. What would be the effect of ANDing an 8-bit graylevel image with a constant value of 15 (00001111 in binary)? Why might you want to do this? Try this out on
    bal1


    and comment on what you see.

References


E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51, 171 - 172.
A. Jain Fundamentals of Digital Processing, Prentice Hall, 1989, pp 239 - 240.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Logical OR/NOR



Common Names: OR, NOR

Brief Description


OR and NOR are examples of logical operators having the truth-tables shown in Figure 1.



Figure 1 Truth-tables for OR and NOR.


As can be seen, the output values of NOR are simply the inverses of the corresponding output values of OR.
The OR (and similarly the NOR) operator typically takes two binary or graylevel images as input, and outputs a third image whose pixel values are just those of the first image, ORed with the corresponding pixels from the second. A variation of this operator takes just a single input image and ORs each pixel with a specified constant value in order to produce the output.

How It Works


The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the OR operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use


We can illustrate the function of the OR operator using
scr3


and
scr4


The images show a scene with two objects, one of which was moved between the exposures. We can use OR to compute the union of the images, i.e. highlighting all pixels which represent an object either in the first or in the second image. First, we threshold the images, since the process is simplified by use binary input. If we OR the resulting images
scr3thr1


and
scr4thr1


we obtain
scr3or2


This image shows only the position of the object which was at the same location in both input images. The reason is that the objects are represented with logically 0 and the background is logically 1. Hence, we actually OR the background which is equivalent to NANDing the objects. To get the desired result, we first have to invert the input images before ORing them. Then, we obtain
scr3or1


Now, the output shows the position of the stationary object as well as that of the moved object.
As with other logical operators, OR and NOR are often used as sub-components of more complex image processing tasks. OR is often used to merge two images together. Suppose we want to overlay
wdg2


with its histogram, shown in
wdg2hst1


First, an image editor is used to enlarge the histogram image until it is the same size as the grayscale image as shown in
wdg2hst2


Then, simply ORing the two gives
wdg2or1


The performance in this example is quite good, because the images contain very distinct graylevels. If we proceed in the same way with
bld1


we obtain
bld1or1


Now, it is difficult to see the characters of the histogram (which have high pixel values) at places where the original image has high values, as well. Compare the result with that described under XOR.
Note that there is no problem of overflowing pixel values with the OR operator, as there is with the addition operator.
ORing is usually safest when at least one of the images is binary, i.e. the pixel values are 0000... and 1111... only. The problem with ORing other combinations of integers is that the output result can fluctuate wildly with a small change in input values. For instance 127 ORed with 128 gives 255, whereas 127 ORed with 126 gives 127.

Exercises



  1. NOR
    cir2


    and
    cir3


    and AND their negatives. Compare the results.
  2. Why can't you use thresholding to produce a binary image containing both objects of
    pap2


    and
    pap3


    ? Use graylevel ORing to combine the two images. Can you detect all the locations of the objects in the two images? What changes if you invert the images before combining them.
  3. In the example above, how could you make the histogram appear in black instead of white? Try it.
  4. Summarize the conditions under which you would use OR to combine two images rather than, say, addition or blending.

References


R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51, 171 - 172.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Logical XOR/XNOR



Common Names: XOR, XNOR, EOR, ENOR

Brief Description


XOR and XNOR are examples of logical operators having the truth-tables shown in Figure 1.



Figure 1 Truth-tables for XOR and XNOR.


The XOR function is only true if just one (and only one) of the input values is true, and false otherwise. XOR stands for eXclusive OR. As can be seen, the output values of XNOR are simply the inverse of the corresponding output values of XOR.
The XOR (and similarly the XNOR) operator typically takes two binary or graylevel images as input, and outputs a third image whose pixel values are just those of the first image, XORed with the corresponding pixels from the second. A variation of this operator takes a single input image and XORs each pixel with a specified constant value in order to produce the output.

How It Works


The operation is performed straightforwardly in a single pass. It is important that all the input pixel values being operated on have the same number of bits in them, or unexpected things may happen. Where the pixel values in the input images are not simple 1-bit numbers, the XOR operation is normally (but not always) carried out individually on each corresponding bit in the pixel values, in bitwise fashion.

Guidelines for Use


We illustrate the function of XOR using
scr3


and
scr4


Since logical operators work more reliably with binary input we first threshold the two images, thus obtaining
scr3thr1


and
scr4thr1


Now, we can use XOR to detect changes in the images, since pixels which didn't change output 0 and pixels which did change result in 1. The image
scr3xor1


shows the result of XORing the thresholded images. We can see the old and the new position of the moved object, whereas the stationary object almost disappeared from the image. Due to the effects of noise, we can still see some pixels around the boundary of the stationary object, i.e. pixels whose values in the original image were close to the threshold.
In a scene like
pap1


it is not possible to apply a threshold in order to obtain a binary image, since one of the objects is lighter than the background whereas the other one is darker. However, we can combine two grayscale images by XORing them in a bitwise fashion.
pap3


shows a scene where the dark object was moved and in
pap2


the light object changed its position. XORing each of them with the initial image yields
pap1xor1


and
pap1xor2


respectively. In both cases, the moved part appears at the old as well as at the new location and the stationary object almost disappears. This technique is based on the assumption that XORing two similar grayvalues produces a low output, whereas two distinct inputs yield a high output. However, this is not always true, e.g. XORing 127 and 128 yields 255. These effects can be seen at the boundary of the stationary object, where the pixels have an intermediate graylevel and might, due to noise, differ slightly between two of the images. Hence, we can see a line with high values around the stationary object. A similar problem is that the output for the moved pen is much higher than the output for the moved piece of paper, although the contrast between their intensities and that of the background value is roughly the same. Because of these problems it is often better to use image subtraction or image division for change detection.
As with other logical operators, XOR and XNOR are often used as sub-components of more complex image processing tasks. XOR has the interesting property that if we XOR A with B to get Q, then the bits of Q are the same as A where the corresponding bit from B is zero, but they are of the opposite value where the corresponding bit from B is one. So for instance using binary notation, 1010 XORed with 1100 gives 0110. For this reason, B could be thought of as a bit-reversal mask. Since the operator is symmetric, we could just as well have treated A as the mask and B as the original.
Extending this idea to images, it is common to see an 8-bit XOR image mask containing only the pixel values 0 (00000000 binary) and 255 (11111111 binary). When this is XORed pixel-by-pixel with an original image it reverses the bits of pixels values where the mask is 255, and leaves them as they are where the mask is zero. The pixels with reversed bits normally `stand out' against their original color and so this technique is often used to produce a cursor that is visible against an arbitrary colored background. The other advantage of using XOR like this is that to undo the process (for instance when the cursor moves away), it is only necessary to repeat the XOR using the same mask and all the flipped pixels will become unflipped. Therefore it is not necessary to explicitly store the original colors of the pixels affected by the mask. Note that the flipped pixels are not always visible against their unflipped color --- light pixels become dark pixels and dark pixels become light pixels, but middling gray pixels become middling gray pixels!
The image
wdg2


shows a simple graylevel image. Suppose that we wish to overlay this image with its histogram shown in
wdg2hst1


so that the two can be compared easily. One way is to use XOR. We first use an image editor to enlarge the histogram until it is the same size as the first image. The result is shown in
wdg2hst2


To perform the overlay we simply XOR this image with the first image in bitwise fashion to produce
wdg2xor1


Here, the text is quite easy to read, because the original image consists of large and rather light or rather dark areas. If we proceed in the same way with
bld1


we obtain
bld1xor1


Note how the writing is dark against light backgrounds and light against dark backgrounds and hardly visible against gray backgrounds. Compare the result with that described under OR. In fact XORing is not particularly good for producing easy to read text on gray backgrounds --- we might do better just to add a constant offset to the image pixels that we wish to highlight (assuming wraparound under addition overflow) --- but it is often used to quickly produce highlighted pixels where the background is just black and white or where legibility is not too important.

Exercises


  1. XOR
    cir2


    and
    cir3


    Compare the result with the output of XORing their negatives. Do you see the same effect as for other logical operators?
  2. Use the technique discussed above to produce a cursor on
    fce1


    Place the cursor on different location of the image and examine the performance on a background with high, low, intermediate and mixed pixel values.

References


R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp 47 - 51.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 2.
B. Horn Robot Vision, MIT Press, 1986, pp 47 - 48.

Thresholding



Common Names: Threshold, Density slicing

Brief Description


In many vision applications, it is useful to be able to separate out the regions of the image corresponding to objects in which we are interested, from the regions of the image that correspond to background. Thresholding often provides an easy and convenient way to perform this segmentation on the basis of the different intensities or colors in the foreground and background regions of an image.
In addition, it is often useful to be able to see what areas of an image consist of pixels whose values lie within a specified range, or band of intensities (or colors). Thresholding can be used for this as well.

How It Works


The input to a thresholding operation is typically a grayscale or color image. In the simplest implementation, the output is a binary image representing the segmentation. Black pixels correspond to background and white pixels correspond to foreground (or vice versa). In simple implementations, the segmentation is determined by a single parameter known as the intensity threshold. In a single pass, each pixel in the image is compared with this threshold. If the pixel's intensity is higher than the threshold, the pixel is set to, say, white in the output. If it is less than the threshold, it is set to black.
In more sophisticated implementations, multiple thresholds can be specified, so that a band of intensity values can be set to white while everything else is set to black. For color or multi-spectral images, it may be possible to set different thresholds for each color channel, and so select just those pixels within a specified cuboid in RGB space. Another common variant is to set to black all those pixels corresponding to background, but leave foreground pixels at their original color/intensity (as opposed to forcing them to white), so that that information is not lost.

Guidelines for Use


Not all images can be neatly segmented into foreground and background using simple thresholding. Whether or not an image can be correctly segmented this way can be determined by looking at an intensity histogram of the image. We will consider just a grayscale histogram here, but the extension to color is trivial.
If it is possible to separate out the foreground of an image on the basis of pixel intensity, then the intensity of pixels within foreground objects must be distinctly different from the intensity of pixels within the background. In this case, we expect to see a distinct peak in the histogram corresponding to foreground objects such that thresholds can be chosen to isolate this peak accordingly. If such a peak does not exist, then it is unlikely that simple thresholding will produce a good segmentation. In this case, adaptive thresholding may be a better answer.
Figure 1 shows some typical histograms along with suitable choices of threshold.



Figure 1 A) shows a classic bi-modal intensity distribution. This image can be successfully segmented using a single threshold T1. B) is slightly more complicated. Here we suppose the central peak represents the objects we are interested in and so threshold segmentation requires two thresholds: T1 and T2. In C), the two peaks of a bi-modal distribution have run together and so it is almost certainly not possible to successfully segment this image using a single global threshold


The histogram for image
wdg2


is
wdg2hst1


This shows a nice bi-modal distribution --- the lower peak represents the object and the higher one represents the background. The picture can be segmented using a single threshold at a pixel intensity value of 120. The result is shown in
wdg2thr3



The histogram for image
wdg3


is
wdg3hst1


Due to the severe illumination gradient across the scene, the peaks corresponding to foreground and background have run together and so simple thresholding does not give good results. Images
wdg3thr1


and
wdg3thr2


show the resulting bad segmentations for single threshold values of 80 and 120 respectively (reasonable results can be achieved by using adaptive thresholding on this image).
Thresholding is also used to filter the output of or input to other operators. For instance, in the former case, an edge detector like Sobel will highlight regions of the image that have high spatial gradients. If we are only interested in gradients above a certain value (i.e. sharp edges), then thresholding can be used to just select the strongest edges and set everything else to black. As an example,
wdg2sob2


was obtained by first applying the Sobel operator to
wdg2


to produce
wdg2sob1


and then thresholding this using a threshold value of 60.
Thresholding can be used as preprocessing to extract an interesting subset of image structures which will then be passed along to another operator in an image processing chain. For example, image
cel4


shows a slice of brain tissue containing nervous cells (i.e. the large gray blobs, with darker circular nuclei in the middle) and glia cells (i.e. the isolated, small, black circles). We can threshold this image so as to map all pixel values between 0 and 150 in the original image to foreground (i.e. 255) values in the binary image, and leave the rest to go to background, as in
cel4thr1


The resultant image can then be connected-components-labeled in order to count the total number of cells in the original image, as in
cel4lab1


If we wanted to know how many nerve cells there are in the original image, we might try applying a double threshold in order to select out just the pixels which correspond to nerve cells (and therefore have middle level grayscale intensities) in the original image. (In remote sensing and medical terminology, such thresholding is usually called density slicing.) Applying a threshold band of 130 - 150 yields
cel4thr2


While most of the foreground of the resulting image corresponds to nerve cells, the foreground features are so disconnected (because nerve cell nuclei map to background intensity values along with the glia cells) that we cannot apply connected components labeling. Alternatively, we might obtain a better assessment of the number of nerve cells by investigating some attributes (e.g. size, as measured by a distance transform) of the binary image containing both whole nerve cells and glia. In reality, sophisticated modeling and/or pattern matching is required to segment such an image.

Exercises



  1. How would you set up the lighting for a simple scene containing just flat metal parts viewed from above so as to ensure the best possible segmentation using simple thresholding?
  2. In medical imagery of certain mouse nervous tissue, healthy cells assume a medium graylevel intensity, while dead cells become dense and black. The images
    cla3



    clb3


    and
    clc3


    were each taken on a different day during an experiment which sought to quantify cell death. Investigate the intensity histogram of these images and choose a threshold which allows you to segment out the dead cells. Then use connected components labeling to count the number of dead cells on each day of the experiment.
  3. Thresholding is often used in applications such as remote sensing where it is desirable to select out, from an image, those regions whose pixels lie within a specified range of pixel values. For instance, it might be known that wheat fields give rise to a particular range of intensities (in some spectral band) that is fairly unusual elsewhere. In the multi-spectral image
    aer1


    assume that wheat fields are visible as yellow patches. Construct a set of thresholds for each color channel which allow you to segment out the wheat fields (note, you may need to reset your display).
  4. How should the intensity threshold be chosen so that a small change in this threshold value causes as little change as possible to the resulting segmentation? Think about what the intensity histogram must look like at the threshold value.
  5. Discuss whether you expect thresholding to be of much use in segmenting natural scenes.

References


E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 4.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap. 7.
D. Vernon Machine Vision, Prentice-Hall, 1991, pp 49 - 51, 86 - 89.