Automated Image Retrieval Using Color and Texture


Columbia University Technical Report TR# 414-95-20, July, 1995.

1st draft submitted toI.E.E.E. Transactions on Pattern Analysis and Machine Intelligence (PAMI) -- Special Issue on Digital Libraries: Representation and Retrieval, November, 1996.

Automated Image Retrieval Using Color and Texture

John R. Smith and Shih-Fu Chang

Columbia University, Department of Electrical Engineering and

Center for Telecommunications Research,

New York, N.Y. 10027


Abstract -- The growing prevalence of digital images and videos is increasing the need for filters based on visual content. Unfortunately, the appearance of effective tools for searching for and retrieving images and videos from large archives has not accompanied the proliferation of images and videos in digital form. In addition to the text-based indexes built upon human supplied annotations, digital image and video libraries require new algorithms for the automated extraction and indexing of relevant image features. In this paper we investigate a system for automated content extraction and database searching based upon color and texture features. These features are important because colors and textures are fundamental characteristics of the content of all images, giving this work general application towards databases of images and videos from a variety of domains.

We present new algorithms for the automated extraction of color and texture information that use binary set representations of color and texture, respectively. We put special emphasis on the processes of feature extraction and indexing. We demonstrate that our binary feature sets for texture and color provide excellent performance in query response time while maintaining highly effective discriminability, accuracy in spatial localization and capability for extraction from compressed data representations.

We present the color and texture set extraction and indexing techniques and contrast them to other approaches. We examine texture and color searching on databases of 500 and 3000 color images, respectively. Finally, we examine the relationship between color and texture in application towards image and video retrieval. We explore the capability of combining the color and texture maps of images to obtain unified feature maps that characterize image regions by both color and texture. In particular, we indicate that by combining modalities we can better capture and index color patterns and characterize real world objects within images and videos. Finally, we examine the nature and performance of image and video database queries that combine color and texture.

Index Terms -- image and video storage and retrieval, image features, texture, color, content-based image query, compressed-domain image retrieval.

I. Introduction

To date, image and video storage and retrieval systems have typically relied on human supplied textual annotations to enable indexing and searches. The text-based indexes for large image and video archives are time consuming to create. They necessitate that each image and video scene is analyzed manually by a domain expert so the contents can be described textually. The language-based descriptions, however, can never capture the visual content sufficiently. For example, a description of the overall semantic content of an image does not include an enumeration of all the objects and their characteristics, which may be of interest later. A content mismatch occurs when the information that the domain expert ascertains from an image differs from the information that the user is interested in. A content mismatch is catastrophic in the sense that little can be done to approximate or recover the omitted annotations. In addition, a language mismatch can occur when the user and the domain expert use different languages or phrases. Because text-based matching provides only hit-or-miss type searching, when the user does not specify the right keywords the desired images are unreachable without examining the entire collection.

Fortunately, this bleak situation is improved by allowing the computer to provide support in the domain it is most suited for: management of low-level features. The computer can analyze the image and videos and extract pertinent information such as colors, color patterns, textures and shapes. By automatically extracting these features and constructing the corresponding indexes, the image storage and retrieval system is given tremendous new power. The feature indexes will not supplant the text domain, but rather will enhance it. This allows for images and videos to be searched for by queries that use samples of image and video content or that combine keywords and features. For example, the following sample queries are possible: give me images containing an object that looks like this (user provides sample), or give me Bill Clinton wearing a striped tie, or give me a maroon car. When the user provides a sample of content, such as an area of another image, the computer extracts the low-level features of the sample and uses the feature indexes to retrieve items with similar content. In the other queries, the text-searches -- such as for car or Bill Clinton -- are modulated with visual features such as maroon and striped. Although the text index must be created manually, the low-level feature indexes are automatically generated and managed by the computer. As such, the capacity to search using visual features is an enhancement of database system capabilities.

In this paper we propose and evaluate algorithms for automated extraction of color and texture information which enable filtering and searching over large collections of digital images and videos. First, we investigate the use of color for organizing and retrieving images and videos from databases. We maintain that color is an intuitive feature for which it is possible to utilize an effective and compact representation. For the indexing of color in image and video collections, the color extraction algorithm identifies arbitrarily shaped regions within images that contain colors participating in specific color sets derived from a color palette. By way of a search over the color sets for each image, the localized color features are extracted and the index for the database is built directly. This allows for a very fast indexing of the image collection. Furthermore, information about the identified regions, such as the size, shape and position, enables a rich variety of queries that specify no only color content, but also and spatial relationships and composition of regions. Queries supported by the color set technique include the following examples: give me all images containing...

a.) a large dark green area near top of image, i.e., trees

b.) a yellowish-orange spot surrounded by blue, i.e., a sunset

c.) a region composed of red, white and blue, i.e., a flag

d.) an area with red and white in equal amounts, i.e., a checkered table cloth.

We will explain how these queries can be answered using color sets and/or by specifying the spatial relationships and composition of regions. We examine image database retrieval performance using the color set features in terms of retrieval effectiveness and query response time. We also compare the retrieval performance of the color set approach to that of several color histogram methods by measuring retrieval recall and precision scores on queries of databases of 500 and 3000 color images.

We follow a similar approach for the indexing of texture which uses the spatial/spatial-frequency (s/s-f) information derived from compressed image and video data. From the decomposed images the elements of the texture set correspond to the levels of energy contributions within image s/s-f subbands. Using this information, texture is represented as a binary set corresponding to a pattern of energy distribution across the subbands. This approach obtains the spatially localized and arbitrarily shaped regions of texture within each image which are easily indexed using the binary texture feature space. We formalize the texture set extraction and indexing method and examine the retrieval performance on a database of 500 images.

Finally, we examine the relationship between color and texture in application towards image retrieval. We explore the capability of utilizing the color and texture maps of images to obtain combined feature maps that characterize image regions by both color and texture. This gives the user better accuracy in describing the content desired in terms of color and texture characteristics. For example, if the user is interested in retrieving images of the pyramids in Egypt the user can specify the color `tan' and choose a brick-like texture to denote the stonework. This will filter out the images that have regions of the color `tan', that have a different texture or no texture, for example, imagery from the Grand Canyon. We will see that by combining color and texture modalities we can better characterize real world objects and color patterns within images.

We also take an approach towards feature extraction that allows color and texture information to be extracted from compressed representations of image and video data [7][8]. Feature extraction from the compressed-domain provides great potential for reducing computational complexity because it avoids the costly operation of decompressing the images and videos. This is particularly relevant for large image and video databases which may have hundreds of thousands of image and video items stored in compressed form.

In short, the goal of this work is to propose new algorithms for the extraction and management of the low-level features of color and texture in the application of storage and retrieval of images and videos. We utilize a framework which does not restrict the application to any particular domain of imagery. All algorithms presented are carried out and tested on databases of general images. However, we also recognize that by applying the low-level features to specific and constrained-application domains, we may benefit from domain knowledge to derive potentially more useful semantic information. The image and video storage and retrieval application places new requirements on the feature extraction algorithms necessitating that the research areas of texture and color are reinvestigated. In general, the extraction, representation, discrimination and segmentation of texture and color are research areas that are at once long studied and unresolved. The content-based image and video retrieval system provides opportunity for evaluation of image analysis algorithms along new dimensions. In particular, image and video retrieval system performance puts new emphasis on utilizing feature sets that (1) are low dimensional, (2) lend well to efficient indexing, (3) provide spatial localization, and (4) can be extracted from compressed representations [7][8]. We propose and evaluate new algorithms for color and texture management that satisfy these requirements and improve operation of image and video storage and retrieval systems.

II. Visual Features

IIA. Content-Based Query

Recently, researchers have begun to investigate content-based techniques for indexing images using the features of color, texture and shape. The QBIC (Query By Image Content) system [17][28] supports searching through image databases using texture, color and shape. The system provides two types of content representation: by whole image and by manually outlined objects. Texture and color are computed as global characteristics of image and local characteristics of the outlined objects. Since QBIC does not support automated segmentation and extraction of color and texture features the system does not scale well to large image and video storage and retrieval systems. It will be infeasible for all objects of interest within images to be cut out by humans ahead of time in order to allow indexing of objects and spatially localized data, as required by the QBIC system. On the other hand, the representation of images by global characteristics has major deficiencies. The regional contents of images are not well represented by the global color distribution or texture profile. Typically, the user will be interested in isolated objects or regions within images which global features do not capture. Of particular note in the QBIC system is the utilization of a cross-distance color histogram distance function for discriminating color features. The QBIC team has proposed a technique for bounding this function such that color items may be indexed hierarchically. This is one type of technique for improving query response time [17].

The color indexing of multicolored objects was also explored by Swain and Ballard in [37]. They proposed a histogram intersection formula which, under most conditions, eliminates the influence of background pixels on the match value. In particular, this approach allows the comparison of images based upon only the colors of interest. This method was applied by the authors to the retrieval of known objects imaged against a single colored background. However, the color indexing approach does not satisfy the general problem of indexing images and videos that have autonomous color regions of interest, and which are not restricted to known items. A unified approach for the characterization of color, texture and shape was proposed by Caelli and Reye in [6]. However, the automated segmentation and extraction of regional color, texture and shapes from general imagery is not supported by their algorithm. In [12], Chua, et. al., proposed a technique by which color objects are semi-automatically extracted and indexed using color pairs. They reported experiments of retrieval on a data set 100 images and videos. However, their system lacks support for other features. Liu and Picard investigated Wold features for image retrieval [27] which are based on a random-field model for texture. They applied the feature extraction algorithm to retrieval of Brodatz [5] textures and natural scene images. However, the Wold feature set is not directly conducive to compressed-domain feature extraction and the system lacks support for other visual features such as color.

In general, content-based image and video database systems requires the following components:

* identification and utilization of intuitive visual features

* effective feature representation and discrimination

* automatic extraction of spatially localized features

* techniques for efficient indexing

The features that are to be used by the computer should correspond directly to routine notions of vision. For example, color, texture, pattern, shape and motion are fairly obvious concepts to most people. But, it has been hard to measure exactly how these features are discriminated by humans. Such discriminations are also unique to each individual. Given these difficulties, the computer must somehow utilize a characterization and discriminability of these features that is acceptable to the users -- the performance of which can often be very hard to measure. Furthermore, these features must be automatically isolated within regions of images and videos. This enhances the potential for the user to retrieve images using only a description of the objects or regions within the images. Finally, the feature sets should also lend well to efficient indexing. A major problem of feature-based characterizations of visual data is the high dimensionality of the feature spaces. The feature spaces become increasingly difficult to index efficiently with increased dimensionality. This is an important consideration in the choice of representations of visual features. If the features are properly chosen, they may lend well to a natural hierarchy in indexing, or be constructed from a more advantageous space, such as a binary space which can be efficiently indexed.

Of the many modalities of visual features that are possible, color and texture are perhaps the most intuitive and for people. However, extraction and representation of texture and color by computer are still very challenging tasks. Images are derived from the projections of 3-D data onto a 2-D plane. In general, computer image analysis provides only limited information about the original 3-D source. While this prevents the unbounded understanding of images, it also complicates the extraction of low-level features. This is because the appearance of the 2-D projection of real world objects is generally altered by surface texture, lighting, shading, perspective, viewing conditions, object occlusion and view point, all of which generally correspond to information not available in the 2-D image data. What people effortlessly see and classify as specific colors, color patterns or textures are not so obvious to the computer. The knowledge of the 3-D world guides people in the identification of low-level features that are altered by the artifacts of a 2-D view. The lack of a 3-D knowledge-base makes it difficult for the computer to cull the low-level information from the distortion.

Consider the process of seeing color. The human visual system automatically collects isolated points of color in order to mentally construct color solids, patterns and surfaces even under contradictory and distorting information. This involves compensation for lighting, shading and surface properties. Additionally, the perception of color is influenced by surrounding colors and is also highly dependent on viewer adaption and overall lighting conditions. For the computer to provide an acceptable extraction of color information it must arrive at conclusions similar to those derived by the human visual system. This involves the simulation of the many peculiarities of the human visual system.

Furthermore, in terms of feature representation, the evidence that people also tend to be inconsistent in describing visual features further complicates image analysis. People often use different levels of specificity in different contexts. For example, people would typically describe an apple as being `red', probably implying some type of reddish hue. But in the context of describing the color of a car a person may choose to be more specific, instead using the terms `dark red' or `maroon' to describe the same color. Automated color and texture extraction by computer is performed without the benefit of the user's context or a knowledge-base of real world objects to help guide the feature representation.

Image and video storage and retrieval systems that support content-based retrieval must grapple with the problems of automated image analysis. The complete understanding of images by computer would be a desirable solution to this problem, but is not possible given current technology. However, in the absence of such understanding automated extraction of properly chosen low-level features greatly enhance operation of image and video storage and retrieval systems

IIB. Integration of Visual Features

The perception of images is a very complex process. It has been postulated that nerve cells in the retina immediately break down images into separate components, namely contours, textures and colors. These fragmented pieces of information are then reassembled in the brain into a single coherent image. From the coherent image the information about the scene is realized. It is at this high level that humans will naturally tend to organize images. For example, people organize their family pictures or photo albums based on the semantics of the contents of the photos. People typically do not categorize photos based on the color distributions or texture information. Computers, on the other hand, have been used in certain applications to de-construct images into components of textures, shapes, contours and colors. But they are very limited in their capability of reintegrating these features into coherent images. Without image understanding, a computer or automated system cannot provide the level of image organization that is natural or convenient for people. However, we will see how the low-level features can greatly enhance the operation of image and video database systems.

The content of a visual scene can be analyzed at several levels: the semantic level, object level and the visual feature level, see Fig. 1. The user typically prefers to describe image content at the semantic level. This involves not only the recognition of objects in the scene, but also deductions about locations, actions, identities, etc., as supported by the viewer's knowledge. However, it is impractical for the semantic content of every visual scene within an image or video database to be recorded by humans ahead of time. Towards this goal, each image and video scene could be a formidable research item on its own. Alternatively, the database system can support object-level indexing. This requires the identification all objects within the scene, such as, buildings, people, cars, etc. When confined to specific domains typically involved with computer vision the computer has been used to automatically extract objects. Although, with current technology it is not possible to capture object-level content for general applications. Alternatively, object level descriptions can be supplied by humans. However, it is generally impractical to manually enumerate all objects and their characteristics and spatial relationships within all scenes in large image and video databases.

At the lowest level of image content, the visual feature level, the image content is captured by the color, shape and texture processes within the scene. This requires automatic feature extraction by the computer to collect and organize the visual features. This produces independence from human assistance in construction of the indexes. But this approach shifts the burden of the communication effort to the user who now has the added task of translating the desired semantic query into a visual features query. This is not ideal for the user. But, given the constraints of the system it is far easier for the user to translate cognitively between levels of image content than for the computer to do so. Furthermore, this system can allow the user to supply samples of the desired content, such as areas or objects from other images. The system can also provide graphics tools and interfaces for feature space navigation to enable the user to precisely construct the feature based queries. With improvements in technology the computer will improve its capacity to translate information between levels of content. Such gains will increase the capacity for image analysis and reduce the amount of user effort required in formulating queries.

III. Color Sets

Color indexing is a process by which the images and videos in the database are retrieved on the basis of their color content. There are two general notions of color content: one corresponding to global color distribution, and the other corresponding to regional color information. Indexing images by global color distribution has been achieved by using color histograms [17][37]. This provides a good approach for retrieving images that have similar overall color content. For example, images of beach scenes may have very similar color histograms, and retrieval based on this technique will give good results when finding all images that look like the query image of a beach scene. While this may be useful, it is also very limited. Suppose the user is interested in only one aspect of the beach scene, such as the presence of a colored beach ball. Indexing by global color content does not support this type of query. The presence of a beach ball, which may correspond to a particular pattern of colors within a small spatial region of the image will be spread out into the global histogram. If the region is not sufficiently large, its presence cannot be distinguished from noise in the color histogram. Histogram distance functions for comparing color histograms typically cannot eliminate the contribution of background colors. Therefore, it is essential to utilize an approach for color indexing that captures regional color information such that objects can be found.

We propose the color set approach to extract spatially localized color information and provide for efficient indexing of the color regions. Color regions are extracted through steps that transform, quantize and filter the image morphologically such that insignificant color information is lost and prominent color regions are emphasized. The color regions remaining in the image are extracted and represented using color sets that define the color content of the regions. We propose a method by which the large single color regions are extracted first, followed by multiple color regions. We utilize binary color sets to represent the color content which also allows for very efficient indexing. By removing large single color regions first, we also capture color content in a way which is most intuitive to users. The user will primarily use a single color when describing an object. When multiple colors are used, typically they will be limited to two or three colors at a time. While the color set approach allows some multiple color regions to be extracted directly, the remainder can be re-composed at query time by joining single colors regions that are in close spatial proximity. In the next sections we formulate the color set approach and apply it to databases of images. We compare the retrieval effectiveness of color sets to that found for using color histograms for the same databases.

IIIA. Color Set Notation

The color set representation is defined as follows:

Assume that each of the possible colors in a color image may be described by a triple from the 3-D color space. Without loss of generality, assume that is one of many possible transformations that are not necessarily linear between and another color space denoted . For each let the triple represent the transformed color such that,

(EQ 1)

where is a conversion from to . Let be quantizer function that maps each value of to one of bins. Then let represent the quantized color point such that,

(EQ 2) , and

Let the vector represent the point . It follows that is one of possible vectors in the quantized color space .

Definition 1: Color Space -- Let be the dimensional binary space obtained from such that element in space is obtained from,

(EQ 3)

Each axis in binary space represents one of the possible colors from quantized color space . A color set is defined as a vector in binary space , and represents a selection of colors from the color space .

For example, let be the quantized color space with 2 hues, 2 saturations and 2 values. Then is a binary space with dimensions. Each element in corresponds to a color from the quantized color space. A color set contains a selection of colors. For example, the color set corresponds to the selection of three colors from the quantized color space, where the elements with value 1 indicate the selected colors.

The goal of the color extraction technique is to represent a color image by spatially localized color sets . Using the above transformations this is achieved trivially when each pixel in the color image is represented independently by a color set which has a single non-zero value. However, this representation is not useful in the sense that the content of images is better represented using spatial regions that are substantially larger than one pixel. The color extraction method as outlined next proposes one solution for obtaining a color set representation of a color image that provides for excellent characterization of the image content and allows for color images to be indexed based on color content of regions. This can be formulated as a design problem which has and as parameters and requires a procedure for connecting color pixels into regions.

IIIB. Color Region Extraction

The color set approach provides simultaneously a technique for extraction and for quick and efficient indexing. The construction of the color set index is similar conceptually to file inversion, but where the locations of the occurrences of color within an image are kept in a list which is ordered by color. This works well for image colors, because the full set of visible colors can be represented without visible distortion using a fixed finite set of colors. Using a file inversion type of query, first the selected colors are identified. Then the system consults the color list and reports on the occurrences of the query colors across the images in the image database. When the color list points not to individual pixels but to regions, and the indexing is based upon color sets containing possible many colors, the color set approach enables great power in the indexing of color images.

The success of the color set approach relies on a reduction of the dimensionality of the color feature space and the ability to satisfactorily localize color information spatially within images. In short, as illustrated in Fig. 2, this is accomplished by the following means: reduction of the full gamut of colors to a set of manageable size (~100 carefully selected colors). This involves selection of and . One objective is that unacceptably dissimilar colors are not mapped into the same bins. Deference also allows higher tolerance for dissimilarity in color lightness and color saturation while reserving the most fine quantization for hue. A `colorizing' algorithm paints the color images using the reduced palette and a broad brush. This ensures that the most dominant colors and regions are emphasized while insignificant color information is dropped. The processed images retain a visibly acceptable and compact representation of the color content. After this processing a conditional search over the sets of colors remaining in the image reveals the spatially localized color regions. The regions that are sufficiently represented by a color set are mapped into the database index to be retrieved through selection of the color set. The next section discusses the process in more detail.

IIIC. Color Space - Transformation and Quantization

The first design parameter involves the selection of color space, or more specifically the transformation , by which the new color space can be reached from the space. The color format is the most common color format for digital images, primarily to retain compatibility with computer displays. However, the space has the major drawback in that it is not perceptually uniform. Because of this, uniform quantization of space gives perceptually redundant bins and perceptual holes in the color space. Furthermore, ordinary distance functions defined in space will be unsatisfactory because perceptual distance is a function of position in space.

Other color spaces, such as CIE-LAB, CIE-LUV and Munsell offer improved perceptual uniformity [42]. In general they represent with equal emphasis the three color variants that characterize color: hue, lightness and saturation. This separation is attractive because color image processing performed independently on the color channels does not introduce false colors [33]. Furthermore, it is easier to compensate for many artifacts and color distortions. For example, lighting and shading artifacts are typically be isolated to the lightness channel. However, these color spaces are often inconvenient due to the inherent non-linearity in forward and reverse transformations with space. For color extraction we utilize a more tractable transform to color space that has the above mentioned characteristics is non-linear but easily invertible. The transformation from to [23] is accomplished through the following code segment:

Let be the tuple describing a color point in space, and let be the transformed tuple in color space. For defined such that, , can be obtained from , where , and as follows:

(EQ 4) v = MAX(r, g, b).

Let mmm = v - MIN(r, b, g) then s = mmm/v.

Let r1 = (v - r)/mmm, g1 = (v - g)/mmm, b1 = (v - b)/mmm,

then if r == max and g == min then h = 5 + b1,

else if r == max and g != min then h = 1 - g1,

else if g == max and b == min then h = 1 + r1,

else if g == max and b != min then h = 3 - b1,

else if b == max and r == min then h = 3 + g1,

else h = 5 - r1.

The next issue after color space selection is quantization . The color space can be visualized as a cone. The long axis represents value: blackness to whiteness. Distance from the axis represents saturation: amount of color present. The angle around the axis is the hue: tint or tone. Quantization of hue requires the most attention. The hue circle consists of the primaries red, green and blue separated by 120 degrees. A circular quantization at 20 degree steps sufficiently separates the hues such that the three primaries and yellow, magenta and cyan are represented each with three sub-divisions. Saturation and value are each quantized to three levels yielding greater perceptual tolerance along these dimensions. The quantized space which has 166 bins appears in Fig. 3. The quantized space produces binary color space which has 166 dimensions over which color sets may be defined.

IIID. Color Processing

To identify color regions, the images are transformed to the quantized space with 166 color bins. In general, the color processing is not performed on the full size image but rather on a subsampled version such that some color averaging has already taken place. Typically, the subsampled, transformed and quantized images have less than 50 colors. Directly after the transformation it is still premature to extract color regions because insignificant color information interferes. We reduce most of this insignificant detail by using a colorizing algorithm. This processing is accomplished using a median filter on each of the channels. The non-linear median filtering in space eliminates outliers and emphasizes prominent color regions while preserving edge information and not introducing false hues.

IIID.1 Color region labeling

The next step involves the extraction of the color regions from the images. This is done by systematically selecting from the colors present in the image one at a time, and in multiples, each time generating a bi-level image. The levels correspond to the selected and un-selected pixels for the specified color set. Next follows a sequential labeling algorithm that identifies the isolated regions within the image. The characteristics of each color region are evaluated to determine whether the region will be added to the database. One threshold, ta, is one for region size. In our system the region must contain more than ta = 322 pels to be significant. This value still allows for sufficiently small regions to be indexed.

If more than one color is represented in the color set we utilize two additional thresholds. The first additional threshold is the absolute contribution of each color. If a color does not contribute at least tb = 322 pels to the region, the region is not added. Furthermore, the relative contribution is also measured. All colors must contribute to at least tg = 20% of the region area. Notice that this produces a firm limit of 5 colors per color region although, we use only up to 3 colors at a time. If a color region does not pass one of these thresholds then it will not be indexed by that color set. If a region is rejected because one of the colors from the color set is not sufficiently represented, the region still has a chance to be extracted using a reduced color set leaving out the under-represented color. Enforcing the color set thresholds prevents the unnecessary and redundant proliferation of indexed multiple-color regions.

Fig. 4 illustrates the process for extraction of single and multiple color regions. As indicated in Fig. 4(b), only the single color regions larger than ta pixels are extracted. For multiple color region extraction, as indicated in Fig. 4(c) and (d), the region size must be larger than ta pixels and all color elements must contribute at least tb pixels and tg% of total region size. Furthermore, the spatial segments of the color region must be connected. Notice that some multicolored regions, such as {a2,a3} can be retrieved using a multiple-color color set or using spatial composition of single-color color sets. For example, the a query defining multiple-color color set {a2,a3} matches the region. Additionally, a query specifying single color regions {a2} and {a3} such that, {a2} is northeast of {a3}, also matches the color region. In other cases, such as for multicolored region {a4,a5}, composition from existing color sets does not reconstruct the region {a4,a5}. However, the multiple-color extraction captures this region and allows it to be retrieved using a multiple-color color set. For the test database of 3000 color images, the results of color region extraction are tabulated in Table 1. The statistics indicate that a many multiple color regions are extracted from each image. However, this number does not increase combinatorially with the number of image colors or single color regions present. This proliferation of regions is avoided by using the thresholds for region size and color contribution such that only the most significant color content of images is extracted.

Fig. 5 illustrates the process of color region extraction on the Butterfly color image using the color set approach. It shows the extractions of color regions corresponding to 5 color sets. For each color set, pixels within the image belonging to the color set are highlighted Fig. 5(c). After median filtering and size thresholding the remaining highlighted pixels are collected into regions. The color characteristics of the regions are examined according to the thresholds, ta, tb and tg, to determine region significance. After insignificant regions are eliminated, the rest are represented by minimum bounding rectangles that surround each region, and are added to the database index. The information that is retained for each region, namely the color set, region location, region size and image id, is illustrated in Table 2 for the Butterfly color image.

IIID.2 Color image mining

Even with the reasonably small color gamut it is necessary to search systematically for multiple color regions. Otherwise, it will require 2m passes over the image to test all combinations of m colors. Since we have chosen , it will be impractical to search over 2166 color sets. Although typically only ~50 colors appear at a time per image, it is still unreasonable to search over 250 possible color sets. To help prune the search, we utilize a heuristic similar to that used for database mining [1]. The algorithm makes multiple passes over each image, expanding only the color sets that meet minimum support constraints. A color set is explored for an image only if for all elements k in , where there are at least t0 pixels in the image of the color corresponding to k such that t1 pixels of the color have not yet been allocated to a region. We use t0 and t1 = 322. If t0 is not met then will contain colors that cannot be represented sufficiently by any color regions. Exploring this color set and all supersets of it would be futile. If t0 is met while t1 is not, then a color region containing all of the colors in can alternatively be reconstructed using subsets of and spatial composition. Therefore, exploration of and its supersets generate redundant information.

Fig. 6 illustrates an example of the extraction of an American flag in the San Francisco color image. In the extraction process the image is transformed to the quantized color space and processed using the colorizing algorithm. Fig. 6(b) shows the appearance of the processed San Francisco image. The next step searches over colors in the processed image using the approach just described, and extracts color regions from the image. Fig. 6(c) illustrates the pixels that belong to the color set corresponding to the selection of colors red, white and blue. Regions are formed from the pixels in Fig. 6(c) and are evaluated on the basis of the thresholds, ta, tb and tg. For the San Francisco color image, one surviving region has been found that sufficiently contains the colors red, white and blue. The region, corresponding to a U.S. flag as depicted by the minimum bounded rectangle Fig. 6(d) is extracted and is added to the database index.

IIIE. Color Query -- Spatial Locations

As indicated previously, the extraction of spatially localized features is an extremely important aspect of image indexing. The isolated regions of interest within images should be identified and extracted independently from other regions in the image. For example, an image should be retrieved even when the user can describe only part of the image. When images are represented by global color histogram, the ability to characterize regions within the image is lost. The color histogram does not contain information about the spatial locations of color information. Since the color set approach retains information about color region location, the absolute and relative spatial locations of color regions can be specified in the query. For example, the user may request images from the database that have a color regions with color set that is to the left of color set The spatial aspect of the query can be handled through several means.

The first, requires comparisons at query time for each image that matches the color part of the query to determine if its color regions satisfy the spatial part. This approach requires no dedicated data structure for spatial information. However, there is the added computation at query time. Alternatively, the spatial part can be handled by one of several data structures that have been devised for representing and querying spatial information [19][9]. In [9], segmented images are converted into symbolized images which are then represented by 2-D strings from which the original spatial information can be reconstructed. An iconic index can be built from the 2-D strings that allows more direct access to spatial information. Typically such an approach is difficult for feature-based image retrieval because the feature representation of image regions require distance functions for similarity measure and cannot be represented symbolically. However, since we use binary feature sets, image regions can be represented by symbols that correspond to the value of their binary sets. Similarly, the approach in [19], the E() representation of spatial information, uses symbols for image regions and is suitable for regions represented by color sets.

IIIF. Color Query -- Color Specification

The user formulates a color query by constructing m-dimensional binary color sets that describe the color contents of the regions that are of interest to the user. For example, the retrieval of the U.S. flag is specified through a color set with red, white and blue. The color sets for the query may be obtained in a number of ways. One way is for the user to provide a sample of the desired content. When the user can identify a region from an image as being similar to the regions of interest, the color set can be computed for the region and used for the query.

Alternatively, the color sets can be determined by the user picking colors from a color chooser. This technique is utilized in the color based image retrieval system that we created for the World Wide Web http://www.ctr.columbia.edu/~jrsmith/advent/color_demo.html, see Fig. 7. Here, the user creates the query color sets by picking from color swatches taken from the collection of 166 colors. Each color swatch corresponds directly to an axis in the binary color set space. When the user chooses one of the colors from the swatches, the corresponding element of the query color set is given a value of 1. Through multiple selection of colors from the chooser a query color set with multiple colors can also be formed. The colors may alternatively be selected by visually navigating through 3-D color space using sliders, see Fig. 8. Using this interface tool, once the user arrives at the desired color it is transformed and quantized to one of the bins. Again, the corresponding element in the binary color set is given a value of 1. Through the selection of colors the query color set is formed.

The query color sets may alternatively be constructed through textual specification of color. In particular, the Color Naming System ( ) has been developed by Berk [3] that assigns words to colors that are classified along the dimensions of hue, saturation and lightness. The naming system is divided into levels such that at the lowest level the colors are broadly classified. At successively higher levels, progressively descriptive tags provide more fine discrimination between colors. This multilevel decomposition of the color space is perfectly suited for the database application. For example, it allows the user to at times specify the color `orange,' which refers to several elements in the color set that are of orange hue. But, when the user wants to more accurately specify the color as `dark orange,' the converts this into a more narrow selection of the elements in the color set. Depending on the level of color specificity desired by the user, the converts the text into a range of colors in the color set. In general, the mapping from to is from one textual phrase to many colors, and as such is not invertible. However, at the highest level, each corresponds to one element in the quantized color space. Textual specification of color also provides a convenient way for the user to include color modifiers in a text-based image query. When the user wants to use a text-based query for image and video retrieval, the color modifier can be included in the text. For example, the user can specify a query for a `yellow house' textually. The description of yellow will be converted into a color set and be used as a filter to return only the houses that are yellow.

In response to a color query, the color sets specified by the user are quickly matched to region data. This efficiency results from the fact that the color set space is binary. As such, the color region data that populates the database index can be ordered directly by the values of the m-dimensional binary color sets. Since the data can be ordered, it can be indexed using a tree access structure or hash table. This allows retrieval of color regions without the computation of a distance function. The easiest way for the user to broaden the response for a query is directly through broadening the query color set to include other colors that may be acceptable. It is important for the efficiency of the color set query that the color region data is easily indexed such that query response time is low. Color image retrieval based on binary color sets is significantly faster than the color histogram techniques, which require computation of a distance function.

IV. Color Histograms

Another popular method for characterizing image content is to use color histograms. The color histogram for an image is constructed by counting the number of pixels of each color. Retrieval from image databases using color histograms has been investigated in [17][35][37]. In these studies the developments of the extraction algorithms follow a similar progression: (1) selection of a color space, (2) quantization of the color space, (3) computation of histograms, (4) derivation of the histogram distance function, (5) identification of indexing shortcuts. Each of these steps may be crucial towards developing a successful algorithm. But there has been no consensus about what are the best choices for these parameters.

There are several difficulties with histogram based retrieval. The first of these is the high dimensionality of the color histograms. Even with drastic quantization of the color space, the image histogram feature spaces can occupy over 100 dimensions in real valued space. This high dimensionality ensures that methods of feature reduction, pre-filtering and hierarchical indexing must be implemented. The large dimensionality also increases the complexity and computation of the distance function. It particularly complicates `cross' distance functions that include the perceptual distance between histogram bins.

Several attempts have been made to improve color histogram performance. In [35], images were segmented into fixed blocks and each block was indexed separately. In this way some blocks may still retain a reasonable characterization of objects of interest. On the other hand, the QBIC system [28] requires manual segmentation of images. In QBIC the color histograms are computed as attributes of the regions that have been outlined manually. This reduces the potential contribution of background and other irrelevant colors but requires extensive human involvement in identification of the indexed data. Automated segmentation of images using color histograms may eventually provide useful results but has not yet been integrated into large image retrieval systems.

IVA. Color Histogram Definition

An image histogram refers to the probability mass function of the image intensities. This is extended for color images to capture the joint probabilities of the intensities of the three color channels. More formally, the color histogram is defined by where , and represent the three color channels and is the number of pixels in the image. Computationally, the color histogram is formed by discretizing the colors within an image and counting the number of pixels of each color.

Since the typical computer represents color images with up to 224 colors, this process generally requires substantial quantization of the color space. The main issues regarding the use of color histograms for indexing involve the choice of color space and quantization of the color space. When a perceptually uniform color space is chosen uniform quantization may be appropriate. If a non-uniform color space is chosen, then non-uniform quantization may be needed. Often practical considerations, such as to be compatible with the workstation display, encourage the selections of uniform quantization and RGB color space. The color histogram can be thought of as a set of vectors. For gray-scale images these are two dimensional vectors. One dimension gives the value of the gray-level and the other the count of pixels at the gray-level. For color images the color histograms are composed of 4-D vectors. This makes color histograms very difficult to visualize. There are several lossy approaches for viewing color histograms, one of the easiest is to view separately the histograms of the color channels. This type of visualization does illustrate some of the salient features of the color histogram. For example, Fig. 9 illustrates the channel histograms in and spaces. These plots make apparent the differences in which these spaces distribute color data.

IVA.1 Color uniformity

However, the color space is far from being perceptually uniform. To obtain a good color representation of the image by uniformly sampling the space it is necessary to select the quantization step sizes to be fine enough such that distinct colors are not assigned to the same bin. The drawback is that oversampling at the same time produces a larger set of colors than may be needed. The increase in the number of bins in the histogram impacts performance of database retrieval. Large sized histograms become computationally unwieldy, especially when distance functions are computed for many items in the database. Furthermore, as we shall see in the next section, to have finer but not perceptually uniform sampling of colors negatively impacts retrieval effectiveness.

IVB. Color Histogram Discrimination

There are several distance formulas for measuring the similarity of color histograms. In general, the techniques for comparing probability distributions, such as the kolmogoroff-smirnov test are not appropriate for color histograms. This is because visual perception determines similarity rather than closeness of the probability distributions. Essentially, the color distance formulas arrive at a measure of similarity between images based on the perception of color content. Three distance formulas that have been used for image retrieval including histogram euclidean distance, histogram intersection and histogram cross distance.

IVB.1 Histogram euclidean distance

Let and represent two color histograms. The euclidean distance between the color histograms and can be computed as:

(EQ 5)

In this distance formula there is only comparison between the identical bins in the respective histograms. Two different bins may represent perceptually similar colors but are not compared in this distance formula. All bins contribute equally to the distance.

IVB.2 Histogram intersection distance

The color histogram intersection was used for color image retrieval in [37] and [35]. The intersection formula is given by:

(EQ 6)

where and gives the magnitude of each histogram, which is equal to the number of samples. Colors not present in the user's key do not contribute to the intersection distance. This reduces the contribution of background colors. The sum is normalized by the histogram with fewest samples.

IVB.3 Histogram cross distance

The color histogram cross distance was used by the QBIC system [17][28]. The cross distance formula is given by:

(EQ 7)

The cross distance formula considers the cross-correlation between histogram bins based on the perceptual similarity of the colors represented by the bins. In the case that quantization of the color space is not perceptually uniform the cross term contributes to the perceptual distance between color bins.

V. Color Retrieval Experiments

VA. Retrieval Effectiveness

The retrieval effectiveness and query response times were evaluated using test databases of 500 and 3000 color images. The test data sets were obtained from commercial CDROM photo collections and include images from a variety of subjects including people, places, nature and transportation. We conducted several color retrieval experiments on the test databases. The experiments consisted of first giving the user some familiarity with the full data sets. We then asked the user to form a query by picking any color object or region from any of the images. The user then manually examined all items in the database and assessed the relevance of each image in the database to the query on the basis of perceptual similarity with the query region. Note that the assessments were not based on the relevance in cognitive terms such as being of the same real world object, but rather on purely color perceptual bases. After several manual query sessions, we performed automated retrievals based on color sets and color histograms and compared the performances to the user's assessments.

VA.1 Measures of retrieval effectiveness

Two metrics for retrieval effectiveness are recall and precision. Recall signifies the proportion of relevant images in the database that are retrieved in response to a query. Precision is the proportion of the retrieved images that are relevant to the query [25]. More precisely, let A be the set of relevant images in the database and B be the set of retrieved images, then recall and precision are defined through conditional probabilities: recall = P(B|A) and precision = P(A|B).

As indicated by the formulas, the measures of recall and precision require that the relevance of each item in the database to the query be established ahead of time. For the experiments, this was done for each query by subjectively assigning one of three values to each image in the database: relevant = 1, partially relevant = 0.5 and not relevant = 0. This value can be interpreted as the probability that the item is relevant to a particular query.

VB. Color Set Retrieval Effectiveness

The color set query technique was evaluated in two forms: using textual specification of the color query and query through selection of color from color swatches. In each of the cases, the users' selections of colors were converted to color sets which were used to conduct the searches through the image database. In the case of the text-based color query the colors names entered by the user were first converted into colors from the quantized color space, then the query color sets were formed. In the case of color selection from swatches, since the swatches corresponded directly to a color bin in the color sets, the query color sets were formed directly from the users' selections from the swatches.

For each of the color sets queries, the responses to the queries were ordered in three different ways: based on region area, region block size and region count. The orderings do not affect the overall response to a query but determine the order in which the items are presented to the user. The retrieval effectiveness was measured for each of the orderings. The area parameter determines the size of the matched color regions. When ranked by area, the regions of the largest area are returned to the user first. The size parameter determines the size of the minimum bounding rectangle (MBR) surrounding the regions, and when ranked by size, the regions with largest MBR are returned first. The count parameter gives the number of regions per image that match the user's query. When ranked by count the images with the most regions matching the user's query are returned first.

The results of the retrieval experiments on the color image database of 500 color images appear in Fig. 10. We see that the color set query retrieval effectiveness is higher for color swatch queries than text-based queries. This results from the distortion in the CNS mapping used to convert the textual specification of color into the color sets. However, in both cases the retrieval precision is quite high. We also see that the area and size orderings yield very similar results. The count ordering shows slightly less retrieval effectiveness. This seems to indicate that a better match is obtained when a large region matching the color set exists rather than by many small matching regions. This shows good correspondence with the objective of the color set approach, which is to retrieve images containing regions that most significantly match the user's query. The optimum retrieval effectiveness is indicated at the top of each graph. This ideal curve is based directly on the probabilities of relevance assigned to each of the items in the database. It gives the maximum possible retrieval scores.

VC. Color Histogram Retrieval Effectiveness

In a manner similar to that for the color set queries we evaluated the effectiveness of the color histogram approaches on the test set of 500 color images. However, we performed first a series of experiments that evaluated color histogram retrieval as a function of the distance functions and quantization of color space. These experiments show very clearly that the retrieval effectiveness is highly dependent on the choice of color distance function and the quantization of color space. In these experiments, the color histograms were computed for fixed 64x64 blocks within the images. Images were returned in order depending on the block best matching the query histogram. Fig. 11(a) illustrates that the histogram euclidean distance function was not necessary improved through increased quantization in space. Fig. 11(b) illustrates that for the color histogram intersection distance function, as the quantization was made more fine in space the retrieval effectiveness was improved. Fig. 12(a) illustrates that the histogram cross distance method performance decreased with quantization. These experiments reveal that the synergy between the color space, quantization and histogram distance function is a significant determinant in color histogram based image retrieval. The best retrieval score for each distance function appears in Fig. 12(b) for comparison of histogram distance functions. We see that the color histogram intersection distance function performs best for color image retrieval. This results from the ability of the distance computation to reduce contribution of irrelevant colors.

We also evaluated color histogram retrieval on the basis of histograms derived from the whole image compared to histograms produced for fixed blocks within the image. It is much more computationally intensive to compare the query histogram to many blocks within the image. However, this also gives more flexibility in terms of localization of desired regions within matched images. We evaluated color histogram queries on the databases of 500 and 3000 color images to compare retrieval performance in the cases where histogram comparison was made to the whole image or to the many blocks individually within each image. Fig. 13 illustrates that retrieval effectiveness is improved for the fixed block case when the euclidean distance function was used. However, for the histogram intersection function there is little improvement in the already high performance. This shows that the histogram intersection distance function has the ability to diminish the contribution of background colors regardless of region size. The euclidean distance on the other hand performs better on the smaller blocks because of the decrease in influence of background colors.

VD. Comparison of Color Retrieval Techniques

Both the color set and color histogram retrieval techniques were compared using queries on the databases of 500 and 3000 images. The queries were formulated from regional samples removed from images. Both single and multiple color regions were used in the queries. As illustrated in Fig. 14, we see that the color set technique shows much better performance that the color histogram approaches. The plots also make apparent one characteristic of the color set approach, which is that the recall in response to a query is bounded. In other words, there is a point beyond which no further items are retrieved from the index for a particular query color set. In the experiments for color histogram retrieval the same limitation is not apparent. That is because the experiments here allowed all database items to be ordered using the histogram distance measures. However, in practical application the histogram distance is thresholded such that, only a subset of the items in the database are returned. With the color set approach, a threshold is inherent in the color set representations of color regions. This retrieval threshold appears as the truncation of the color set retrieval plots in Fig. 14.

We also evaluated the queries on the basis of query response time. These results appear in Table 3. The times were measured for queries performed on image databases of 500 and 3000 color images using a SGI Onyx workstation. We see that the query response time of the color set queries is negligible, less than a fraction of a second. The histogram approaches are significantly slower due to the computation of the distance function. The histogram intersection queries are returned in a few seconds when comparisons are made only to whole images in the database. When comparisons are made to individual blocks within images the query response times are increased. However, as previously indicated in Fig. 13, this does not substantially improve the retrieval score. The euclidean distance function gives the slowest query repose time, typically measuring in minutes. We also note that the histogram query response times are a function of the number of items in the database. Whereas, the color set query response times increases only negligibly as a function of the number of database items.

VE. Color Set Retrieval Summary

The color retrieval experiments indicate that the color sets perform well for image retrieval based on color. The color set extraction algorithm captures the color regional information within the images. Furthermore, the extracted region are efficiently indexed by the color sets. Compared to color histogram techniques the color sets show better retrieval effectiveness in answer to a user's query, have significantly better query response time and provide spatially localized information. The extraction of color set regions from images is a fully automatic process. Furthermore, the color set extraction is accomplished on low resolution images such that the partial averaging of image colors helps to identify prominent regions. This also makes the color set extraction approach well suited for extraction from compressed images and videos since a low resolution version of the image is typically produced in the spatial-frequency decompositions used for image and video compression. The color set technique provides for a very powerful method for retrieving images and video from large digital storage and retrieval systems.

However, color is only one facet of the image content that may be of interest to the user. In addition, color features are not relevant for gray-scale images. While often, color alone will be the significant determinant in evaluating the perceptual similarity between items, typically the user is interested in regions and objects that are identified by additional features. To the extent the database system can best allow the user to represent the items of interest, the better job the system can do in finding the relevant items from the database. Texture is another important aspect of vision. Texture defines an element of visual perception that is not captured by color extraction. Texture features can be used separately from color and combined with to color to retrieve items from image and video databases.

VI. Texture Sets

Texture indexing is a process by which the images and videos in the database are retrieved on basis of their texture content. Although the precise definition of texture has been allusive, the notion of texture generally refers to the presence of a spatial pattern that has some properties of homogeneity. In particular, the homogeneity cannot result from the presence of only a single color or intensity in the region but requires interaction of various intensities. For texture to be used for image and video retrieval it is necessary that the texture characteristics of objects or regions of interest within images provide an important aspect of their identity. For example, the texture features extracted from a scene of a field of grass describe a visual characteristic of grass fields which can be used to distinguish grass fields from trees in images, whereby color alone may not provide sufficient determination. However, texture may not be an appropriate factor in discriminating between other items such as appearance of cars in images. When texture is modeled sufficiently well it can be used to find certain items in the database. Not all objects and regions of interest can be characterized by texture processes. As such, the goal of the texture extraction for images should not be one of texture segmentation. It is likely that most regions within an image do not have sufficiently homogeneous or prominent textural characteristics. A texture indexing system requires that the regions that adequately possess salient textures features are identified.

We propose a texture set approach to extract spatially localized texture information and to provide efficient indexing of texture regions. Texture regions are extracted by the following process: (1) conversion of color data to gray-level intensity, (2) orthogonal spatial-frequency decomposition of gray-scale image, (3) energy thresholding within each subband and (4) operations to merge pixels of high spatial-frequency energy. After filtering, the data in the subbands are analyzed together to provide labels for the sufficiently large regions that produce characteristic patterns of energy distribution across the subbands. The pattern of energy distribution is represented using a binary texture set, whereby each element in the binary set corresponds to the presence of energy above threshold within a particular subband. The binary texture sets, which label regions within the images are used to retrieve images from the database based on texture content.

We note that the process of using texture for image retrieval is an ambitious one for several reasons. Typically, it has been hard to adequately model texture. Most research on texture uses the Brodatz [5] texture collection which provides a set of mostly homogeneous textures. In this collection there is little distortion resulting from perspective and lighting. Most algorithms for texture analysis are evaluated using the Brodatz texture set or artificial textures that are even more homogeneous. However, the appearance of texture in general classes of imagery is substantially different from that represented in the Brodatz set. Textures in general images are obfuscated by noise and artifacts due to non-uniform lighting, shading and warping over 3-D space among other distortions.

For example, a challenging research problem would be to create a 3-D model of a small world that uses only Brodatz textures mapped to surfaces. Given a 2-D projection of the world the goal would be to segment the regions and classify the textures to the correct Brodatz classes without using any knowledge about the 3-D model. Since this could be handled as a classification problem, the evaluation of the texture analysis algorithm would be straightforward. This is a drastic simplification from what is being attempted in extracting texture from general classes of images but reveals the challenges that are faced. In general, the algorithms for 2-D texture analysis do not extend directly to images obtained from 2-D projections of the real world. In addition, the performance of texture analysis on general classes of images is not easily evaluated. As such, the performance is best ascertained using subjective tests. In particular, the extent to which the texture feature sets help the user find relevant images that contain perceptually similar textures provides the most significant evaluation of the texture feature set.

We are not proposing a solution for the problem of the unrestricted extraction of texture regions from general classes of 2-D images. Rather, we are investigating how spatial-frequency information may be used to provide characterizations of regions within images that can be used to index and retrieve images. We look for the ability of the spatial-frequency texture set to capture regional information such that the discriminability achieved is aligned to the user's perception of the regions. We also will see that the spatial-frequency approach is conceptually supported by other research that has investigated the use of narrow-band spatial-frequency filters for analyzing texture. The spatial-frequency approach has the ability to capture the presence of dominant information at different scales and orientations for the image. This information provides a characterization of the regions within images that enables them to be later retrieved on the basis of the pattern of scale and orientation energy. This texture feature set is especially relevant because it allows extraction of texture features from compressed forms of data that have been decomposed using spatial-frequency techniques. This greatly reduces the computation necessary for extraction of texture over a large image and video database.

VIA. Texture Features

Texture is an important element to human vision. Texture has been found to provide cues to scene depth and surface orientation. People also tend to relate texture elements of varying size to a plausible 3-D surface. Even in graphics systems greater realism is achieved when textures are mapped to 3-D surfaces. Texture features have been used to identify contents of ariel imagery such as bodies of water, crop fields and mountains. Textures may be used to describe content of many real-world images: for example, clouds, trees, bricks, hair, fabric all have textural characteristics. Particularly when combined with color and shape information the details important for human vision are provided.

Recent research on texture has explored both the analysis and synthesis of texture. The analysis of texture has been applied to problems of texture classification, discrimination and segmentation in this order of increasing difficulty. One important difficulty in modeling texture results from the nebulous definition of texture. In general, textures are visual patterns or spatial arrangements of pixels that regional intensity or color alone does not sufficiently describe. Textures may have statistical properties, structural properties, or both [22]. They may consist of the structured and/or random placement of elements, but also may be without fundamental subunits. Moreover, due to the diversity of textures appearing in natural images it is difficult to narrowly define texture.

The major attempts at modeling texture include but are not limited to the following approaches: random field modeling [27][13], fractal geometry [11], co-occurrence matrices [22][20] and spatial-frequency techniques [24][10][36][26] which include in particular Gabor filters [32][30][4]. A comparison of these four classes of texture features was made by Ohanian and Dubes in [29] using small test collections of natural and artificial images. The authors found that co-occurrence matrices performed best in the experiments, but they acknowledged that performance could be largely influenced by optimization within each class. In general, there is no one best texture model. However, the ones that are most aligned with mechanisms of vision are the spatial-frequency techniques. There is support that mechanisms of early human vision use receptive field units tuned to orientations and spatial-frequencies [24][21]. In particular, models of the human vision system that use Gabor filters to model the receptive fields sufficiently account for psychophysical data obtained in texture discrimination experiments. The spatial-frequency approach is especially attractive in the image and video database application because many image compression techniques also use spatial-frequency decomposition to achieve image energy compaction. For example, the JPEG standard for image compression and MPEG standard for video compression use the discrete cosine transform over fixed blocks which produces a spatial-frequency decomposition. Other compression techniques such as, wavelets, wavelet packets and subband approaches for image and video compression utilize spatial-frequency decomposition to achieve compression of data. Therefore, it is likely that great amounts of image and video data in large archives will already be in a form by which spatial-frequency approaches to texture are most appropriate. Therefore, given the application of image and video retrieval from databases we concentrate primarily on texture feature sets that are derived from the image and video spatial-frequency representations.

The objective of texture indexing is similar to that for color indexing. In order to sufficiently retrieve images by texture, the texture feature set must adequately capture texture content. It must also allow for easy extraction of the texture features from images. The texture features must also lend themselves to efficient indexing.

VIA.1 Spatial/Spatial Frequency Features

We propose a feature set for texture derived from the pattern of spatial-frequency energy across image subbands. This differs somewhat from the predominant method of extracting texture using Gabor filters which characterizes a texture by its single dominant spatial-frequency [4]. The Gabor approach requires high spatial-frequency resolution in terms of scale and orientation such that the texture can be modeled using one scale and orientation. Attaining this necessarily high resolution in spatial-frequency for all frequencies in large image and video databases is not practical. Therefore, we trade-off accuracy in spatial-frequency resolution and instead utilize multiple bands to characterize textures. This approach is partially aligned with the tree-structured wavelet transform (TSWT) model [10] which classifies textures not by a single narrow subband but by the overall wavelet packet basis that best represents the image data. Spatial-frequency feature sets that use multiple subbands were also investigated in [24][26][29] and by us in [36]. In these studies texture features derived from image spatial-frequency data perform well in texture classification experiments.

Essentially, spatial frequency methods work by capturing frequency content in localized regions in the spatial domain. These methods are able to achieve high resolution in both space and spatial-frequency in accordance with the uncertainty principle [14][41]. This allows for computation of localized spatial-frequency content based on a small region around each point in the image. The region size for computation is a function of the filter bandwidth. When filters are chosen to have octave band spacing between filters there are dyadic trade-offs between spatial localization and spatial-frequency resolution. For example, doubling the bandwidth of a filter also doubles the spatial resolution of the filter output. This is important in operations such as texture segmentation where the location of the texture pattern is relevant or extraction of texture boundaries is important to preserve spatial information.

VIA.2 Gabor Functions

Gabor filters produce spatial-frequency decompositions that achieve the theoretical lower bound of the uncertainty principle. They attain maximum joint resolution in space and spatial-frequency bounded by the relations, and , where gives resolution in space and gives resolution in spatial-frequency. This is highly significant in the process of texture extraction in which the conflicting objectives of accuracy in texture representation and texture spatial localization are important. The Gabor filter based spatial-frequency analysis of texture was explored by Bovick [4], Porat [31], Reed [32], Dunn [16], du Buf [15]. In addition to good performance in texture discrimination and segmentation, the justification for Gabor filters is also supported through psychophysical experiments. It has been demonstrated [2] that human texture segregation results from information corresponding to outputs of spatial frequency channels. Texture analyzers implemented using 2-D Gabor functions have produced a strong correlation between outputs of banks of 2-D Gabor filters with actual human segmentation [32]. Furthermore, the receptive visual field profiles are adequately modeled by 2-D Gabor filters [14].

The 2-D Gabor function is a harmonic oscillator composed of a sinusoidal plane wave of a particular frequency and orientation, within a Gaussian envelope. The frequency, bandwidth and orientation are controlled by parameters. The Gabor function is defined as follows:

(EQ 8)

where x0 and y0 specify the center of the Gaussian and s specifies the standard deviation along both axes. The frequency of the sinusoidal plane wave is determined by w, q is the angle of orientation and j is the phase of the plane wave. Gabor functions are used for texture segmentation by finding the filters that are tuned to the image's dominant spectral information. This entails selection of parameters for frequency, orientation and bandwidth for each filter. The selection is highly dependent on the image. In [4] a simple peak-finding algorithm applied to the power spectrum was used to guide the choice of filter frequency. Using the chosen set of Gabor functions, the image is filtered. Each point in the image is labeled according to which filter output gives highest energy value for the point. The result is that the image is segmented into regions according dominant spectral information.

There are drawbacks with using the Gabor filters in practical application. The first concerns the consequence that the selection of filters for an image is data dependent. The process for selecting the filters is nontrivial [4]. Otherwise, the accurate implementation of a complete Gabor expansion would entail a generally impractical number of filters. Secondly, the application of the filters to images is not simple. The computation of the filter coefficients is a complex process because the Gabor functions are not orthogonal. Furthermore, discrete versions of the Gabor function must be obtained in order to be applied to images.

VIA.3 Wavelet subband features

Another way to gain in the trade-offs between space and spatial-frequency resolution is with a dyadic filter bank. This produces octave bandwidth segments in spatial-frequency. It allows simultaneously for high spatial resolution at high spatial-frequencies and high spatial-frequency resolution at low spatial- frequencies. Furthermore, the wavelet tiling is supported by evidence that frequency receptors are spaced at octave distances. A very practical filter bank for image decomposition is the quadrature mirror filter (QMF) bank. The QMF filter bank decomposes the image into low-pass and high-pass spatial-frequency bands. The QMF filter bank produces a octave band split or wavelet decomposition when filtering is recursively applied to the low-pass spatial-frequency bands. In particular, separable QMF filters can be used in the filter bank which reduces the computational complexity of the filter banks.

The QMF wavelet decomposition which has been used for image compression is also very attractive for texture analysis. It provides a low cost decomposition of the image. When iterated in wavelet fashion the filter bank exploits the trade-offs in space and spatial-frequency resolution. This was revealed in the Gabor filter texture studies to be extremely important in texture analysis. Given these attractive characteristics of the QMF filter bank, we propose that feature sets for texture indexing in the database application are derived from the QMF wavelet decomposition of images.

The QMF approach to texture classification was used by Kundu and Chen [26]. The authors identified several aspects of the QMF filter bank as being relevant to texture extraction: ability to achieve perfect reconstruction, outputs are localized filters and the decimation of filter outputs reduces complexity. In the classification of Brodatz textures the QMF features performed better than those proposed by Haralick [22]. Furthermore, in [36] we evaluated the performance of texture features sets derived from several spatial-frequency decompositions. In particular, we compared the classification performance of DCT, QMF wavelet and uniform subband energy feature sets on the basis of their ability to classify texture cuts made from all 112 Brodatz textures. We found that a 9 dimensional feature set composed of measures of the subband energies provides for over 90% successful classification for each of the spatial-frequency decompositions.

For the purpose of indexing images by texture, we use the same approach. The texture feature sets are based measures of energies within spatial-frequency bands. From the outputs of a wavelet filter bank that has three iterations on the low frequency, a texture vector is computed for each pixel in the image, which is determined by the magnitudes of the filter outputs. This is illustrated in Fig. 15. By thresholding the output of each filter, we create a binary texture set that indicates for each pixel in the image whether the corresponding filter outputs for that pixel are above threshold. The texture sets are defined over a 9-dimensional binary space which corresponds to the outputs of filters in a wavelet filter bank with 3 iterations. From this binary space, the texture sets are formed and are used to index the images. The additional elements of this system are based on operations on the filter outputs, which will be discussed later, that merge image points into spatial regions that have homogeneous texture content. Based on the identified texture regions, the texture sets are used for image retrieval.

VIB. Texture Set Notation

The texture set representation is formulated as follows:

Let be the discrete 2-D representation of an image. Let , where to be the set of filters such that the 2-D convolution,

(EQ 9)

band-limits to some narrow 2-D range of frequencies in the spatial-frequency domain. Let be the threshold in subband such that,

(EQ 10)

The set of images refers to the collection of subband images such that each contains a narrow range of spatial-frequencies from . The set of images refers to the bi-level images obtained from the by thresholding each at level , respectively.

Definition 2: Texture Set -- Let be the m dimensional binary space whereby a value of 1 on axis i in binary space corresponds to the presence of energy above threshold in , the thresholded energy output of filter i. A texture set is defined as a vector in binary space . A texture set corresponds to the signification of energy of filter outputs above threshold.

The objective of the texture extraction technique is to utilize the spatial-frequency decompositions produced by the filters to capture the characteristics of texture. The parameters involved are primarily the choice of filters and the energy thresholds . provides texture descriptions at the pixel level within the image . However, this description of texture is not immediately useful. Instead we also devise an algorithm such that spatial points in are joined into larger spatial regions. Each spatial region is represented by a binary texture set . Indexing of the image database by texture is accomplished by accessing the spatial regions within images on the basis of their texture set values .

VIC. Texture Feature Extraction

The texture features are extracted from the image using the outputs of the QMF wavelet filter bank. In this study the Haar filter, was used in the filter bank. This filter was chosen because of its low complexity and in particular for its poor stop band performance. Typically, filters with better stop band characteristics should be used. The Haar provides for a worst case evaluation in terms of spatial-frequency energy leakage. However, in this study we are concerned with texture indexing performance under simplifying conditions as such. If good indexing can be achieved for the Haar case then other filters should perform quite well.

The process of texture extraction is diagramed in Fig. 16. The first step in texture extraction involves the magnitude operation on the filter outputs. The value at image points in each filter output is replaced with the absolute value of the output. This is a measure of the band limited spatial-frequency energy at each image point. The energy images appear in Fig. 17(b) for texture extraction on the barbara image. Next, each of the filter outputs i is thresholded at level to produce a binary image. Each binary image will have values at points of high energy that correspond to the presence of lines, edges, noise and textures within the original image. The next operation reduces the edge, line and noise points while enhancing the texture values. This is achieved through a non-linear filtering operation on each bi-level image. The median filter is used in three passes over each bi-level image. In addition, a sequential labeling algorithm is applied to each bi-level image that eliminates the remaining points and regions that are below a size threshold. The outputs resulting from these operations appears in Fig. 17(c).

From the filtered bi-level images the full image is reconstructed such that each image point is replaced by a binary vector that indicates the value of the filter outputs at the point. This operation essentially overlays the bi-level images such that the reconstructed image contains composites of overlapping regions from the bi-level images. At this stage the regions give a premature indication of the texture content. Another step of median filtering is used to allow dominant regions to be enhanced. A region size threshold is also applied to eliminate insignificant regions. This produces the binary texture set image as illustrated in Fig. 17(d). This image contains the labeled regions of texture that were found in the image. Each region is represented by a binary texture set that indicates the significant spatial-frequency content in each region. The regions corresponding to extracted textures are illustrated in Fig. 17(d).

VID. Texture Set Retrieval

The texture set effectiveness was evaluated using examples of image retrieval based on texture content. The image subband texture features were previously evaluated using the Brodatz texture classes [36], and were found to provide good texture classification. The new aspect here regards the extraction of texture from general 2-D imagery. Since the performance of this texture extraction for generic images is difficult to measure independently, we deferred evaluation to the final determination of image texture retrieval. Our initial major performance criteria is the subjective evaluation based on image retrieval trials.

We evaluated the texture features sets by composing queries of the image database using texture. The binary texture query feature sets can be formed in a number of ways. When the user provides a sample image the texture set is computed for the image using the same process of texture extraction. The texture set is used to query the database to retrieve items of similar texture content. Alternatively, we assigned each of the subbands a pattern that indicates the predominant scale and orientation characteristics of the corresponding filter. As such, this allows individual bands to be identified as having some characteristic such as closely spaced horizontal lines, or vertical frequencies, etc. The results of a query based on this breakdown is illustrated in Fig. 18. Here, the user requested images from the database of 500 images that have textures consisting of horizontal lines. This query texture was formed by specifying that dominant energy is required only in bands corresponding to vertical frequencies. The images retrieved from the database contain regions which are characterized as patterns with dominant vertical frequency information. It can been seen that the texture sets capture salient textural information within images and allow images to be retrieved on the basis of texture content.

VIE. Texture Set Retrieval Summary

We presented the texture set approach for retrieving images and videos from large databases. The texture set technique is based on the extraction of regional texture information from images. We proposed an algorithm for texture extraction and a binary feature set that provides effective discrimination based on texture. The binary feature set also allows highly efficient indexing of the texture regions. As was the case for the color set, the texture analysis also extracts localized information within the images, which allows for spatial information to be retained and used for queries. This allows queries to be formulated that include specification of spatial locations and relationships of texture regions. We demonstrated retrievals of images from the database of 500 images based texture content. We showed that the binary texture set approach sufficiently extracts textural information from images and provides for effective matching between the query and image textures in the database.

VII. Integration of Visual Features

The color set and texture sets can be combined to created an even more powerful characterization of image content. In certain situations, color and texture alone do not sufficiently describe a region of interest or object within an image. For example, if the user wants to retrieve an image with a large grey sky, color alone may not narrow down the query to a small set of matches. There could be many grey regions in images in the database which are not of interest to the user, such as from buildings, street scenes, terrain, etc. However, if the user further specifies that the regions desired should be smooth, or not have any texture pattern, this can better filter the response of the query such that the desired content is returned.

One approach towards combining modalities for texture and color queries is to use separate queries and simply return the common elements. This approach would be particularly appropriate if each domain required a distance function for measuring feature similarity. In this case, it would be hard to assign the appropriate distance weights to one modality compared to the other. There is one deficiency in summing query results from individual modalities in response to a texture and color query. This results from the likelihood that color regions and texture regions do not overlap or overlap partially. To give the correct responses to the combined query, the system would need to recompose the images from the feature maps and identify the extent to which image regions overlap. This provides added complexity to the system and requires that additional parameters are kept for all regions such that the computation based on the region extents is accurate. This approach does not eliminate from consideration until the overlap computation stage, the images that have zero or insignificant overlap between regions. Furthermore, given that an image may have several matches in color and/or query, all of the matches must be compared cross-wise across modalities to examine whether a combined match is found.

An example of combining modalities of texture and color to characterize image regions appears in Fig. 19. As illustrated, one texture region in the image overlaps two color regions. In the image, the areas where the grass is low have the same texture as the hide of the buffalo. But the regions have different colors. Additionally, one color region overlaps two texture regions. In the image, the all areas of grass have the same color. But the regions where grass is high have a different texture from where the grass is low. This example illustrates that typically texture and color regions do not overlap completely with each other. Additional processing at query time must be used to reveal matches.

Since we have proposed binary feature sets for indexing, we can efficiently combine the texture and color modalities at index time. This avoids the computation at query time to determine overlap between regions. In the alternative case where regions are extracted based on combined features, the binary sets avoid having to develop a distance function that includes both modalities. Therefore, the binary set representations of the features make it straightforward to build a combined binary index during the feature extraction process. As the color and texture content of each image are extracted, the color and features maps can be overlaid. At each point in the color map image, a binary color vector indicates the color set of the owner region. Likewise for each point in the texture map image, a binary texture vector indicates the value of the texture content. By concatenating the binary color vectors and the binary texture vectors, a new unified feature map is produced, see Fig. 20. A separate index is created for the unified features. In the case of binary feature spaces for color and texture, the regions are indexed using the binary vector that is defined over both the binary color space and the binary texture space . Since the combined feature space is also binary it lends to efficient indexing based on combined color and texture contents of images.

VIII. Summary

In this paper we presented algorithms for extracting, searching for and retrieving visual data from image and video databases using color and texture. We proposed techniques by which color and texture are captured automatically and represented in binary feature spaces. The binary feature set is a novel approach towards efficient feature representation and indexing. We presented the binary feature set approach for color, by which local, arbitrarily shaped image regions are represented by color sets. The color sets indicate the significant colors that contribute to the appearance of each region. Similarly, we presented a binary feature set for texture which is derived from spatial-frequency energy. The texture set captures local texture content and extracts arbitrarily shaped regions of texture from the image. The texture set indicates the spatial-frequency bands that contribute to the textural appearance of each region. For both color and texture extraction, the spatial localization afforded by the binary set extraction techniques allows database queries using texture and color to include specification of spatial locations and relationships.

The binary feature sets have other essential characteristics that make them perfectly suited for image and video databases. The binary feature spaces lend well to extremely efficient indexing. Retrieval from the database using the binary feature sets is accomplished without distance function computation, and therefore can be performed through direct lookup of matching items. Another important aspect of our approach is the capability for extraction of features from compressed data. The color features are extracted from low resolution images. The texture features are designed to be the product of spatial-frequency decomposition, which is the probable form of compressed image and video data in the image and video databases. The binary features spaces also provide for a novel way to combine several feature modalities by concatenating feature sets. This property allows for image content to be characterized by combined feature sets that describe both the color and texture of image regions. The binary feature sets for color and texture produced excellent performance in image retrieval experiments. We demonstrated that the features perform well in both retrieval effectiveness (color) and query response time in image retrieval experiments on databases of 500 and 3000 general color images.

Image and video storage and retrieval systems are placing new demands on computers for automated analysis of digital images and videos. The automated feature extraction of spatially localized color and texture information from images provides new powerful capabilities for image retrieval. The binary set representations of color and texture presented here provide excellent visual discrimination performance and the capacity for efficient content-based image and video retrieval from large image and video databases.

Acknowledgments -- The authors wish to thank Louis Hualu Hwang for his insightful comments and W.E. Rose for help with revision.

IX. References

[1.] Agarwal R., et. al, "Mining Association Rules between Sets of Items in Large Database," ACM SIGMOD-93, Washington, DC, May, 1993.

[2.] Beck J., Sutter A., and Ivry R., "Spatial Frequency Channels and Perceptual Grouping in Texture Segregation," Computer Vision, Graphics, and Image Processing, 37, 299-325, 1987.

[3.] Berk T., L. Brownstone, A. Kaufman, "A new color naming system for computer graphics," I.E.E.E. Computer Graphics and Applications, 2:, 1982, pp. 37 -- 44.

[4.] Bovick A.C., and Clark M., "Multichannel Texture Analysis Using Localized Spatial Filters," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 1, January 1990.

[5.] Brodatz P., Textures: A Photographic Album for Artists & Designers, Dover Press, 1966.

[6.] Caelli T., Reye D., "On the Classification of Image Regions by Colour, Texture and Shape," Pattern Recognition, Vol. 26, No. 4, pp. 461--470, 1993.

[7.] Chang S.-F., and Smith J.R., "Extracting Multi-Dimensional Signal Features for Content-Based Visual Query," SPIE Symposium on Visual Communications and Signal Processing, May 1995.

[8.] Chang S.F., "Compressed-Domain Techniques for Image/Video Indexing and Manipulation," Special Session on Digital Library and Video-On Demand, I.E.E.E. International Conference on Image Processing, Washington, D.C., October, 1995.

[9.] Chang S.K., Yan C.W., Dimitroff D.C., Arndt T., "An Intelligent Image Database System," I.E.E.E. Transactions on Software Engineering, Vol. 14, No. 5, May 1988.

[10.] Chang T., and Kuo C.-C., "Texture Analysis and Classification with Tree-Structured Wavelet Transform," I.E.E.E. Transactions on Image Processing, vol. 3, no. 4, October, 1993.

[11.] Chaudhari B.B., and N. Sarkar, "Texture Segmentation Using Fractal Dimension," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 1, January 1995.

[12.] Chua T.S., S-K Lim and H-K Pung, "Content-based Retrieval of Segmented Images," Proceedings of ACM Multimedia 94, San Francisco, Ca., October 1994.

[13.] Cohen F., Z. Fang and M.A. Patel, "Classification of Rotated and Scaled Textured Images Using Gaussian Random Field Models," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 2, February 1991.

[14.] Daugman J.G., "Entropy Reduction and Decorrelation in Visual Coding by Oriented Neural Receptive Fields," I.E.E.E. Transactions on Biomedical Engineering, vol. 36, no. 1, January 1989.

[15.] du Buf J.M.H., "Abstract processes in texture discrimination," Spatial Vision, Vol. 6, No. 3, pp. 221--242, 1992.

[16.] Dunn D., and Higgins W.E., "Optimal Gabor Filters for Texture Segmentation," I.E.E.E. Transactions on Image Processing, vol. 4, no. 7, July 1995.

[17.] Faloutsos C., Flickner M., Niblack W., Petkovic D., Equitz W., and Barber R., "Efficient and Effective Querying by Image Content," IBM Research Journal, No. 9453 (83074), August 3, 1993.

[18.] Fogel I., and Sagi D., "Gabor Filters as Texture Discriminator," Biol. Cybern., 61, 103-113, 1989.

[19.] Gevers T., and A.W.M. Smeulders, "An Approach to Image Retrieval for Image Databases," Database and Expert System Applications (DEXA-93), 1993.

[20.] Gotlieb C.C., and H. E. Kreyszig, "Texture Descriptors Based on Co-occurrence Matrices," Computer Vision, Graphics, and Image Processing, 51, 70--86, 1990.

[21.] Griffiths E., and T. Toscianko, "Can human texture discrimination be mimicked by a computer model using local Fourier Analysis," Spatial Vision, Vol. 6, No. 2, pp. 149--157, 1992.

[22.] Haralick R. M., "Statistical and Structural Approaches to Texture," Proceedings of the IEEE, Vol. 67, No. 5, May 1979.

[23.] Hunt R.W.G., Measuring Color, John Wiley & Sons, 1989.

[24.] Jernigan M.E., and D'Astous F., "Entropy-Based Texture Analysis in the Spatial Frequency Domain," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, vol. pami-6, no. 2, March 1984.

[25.] Jones K.S., Information Retrieval Experiment, Butterworth & Co., 1981.

[26.] Kundu A., and J.-L. Chen, "Texture Classification Using QMF Bank-Based Subband Decomposition," CVGIP: Graphical Models and Image Processing, Vol. 54, No. 5, September, pp. 369--384, 1992.

[27.] Liu F., and Picard R.W., "Periodicity, directionality, and randomness: Wold features for image modeling and retrieval," M.I.T. Media Laboratory Perceptual Computing Section Technical Report, No. 320, 1995.

[28.] Niblack W., Barber R., Equitz W., Flickner M., Glasman E., Petkovic D., Yanker P., Faloutsos C., and Taubin G., The QBIC Project: Querying Images by Content Using Color, Texture, and Shape," IBM Research Journal, No. 9203, February 1, 1993.

[29.] Ohanian P.P., and R. C. Dubes, "Performance Evaluation for Four Classes of Textural Features," Pattern Recognition, Vol. 25, No. 8, pp. 819--833, 1992.

[30.] Porat M., and Zeevi Y.Y., "The Generalized Gabor Scheme of Image Representation in Biological and Machine Vision," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, July 1988.

[31.] Porat M., and Zeevi Y.Y., "Localized Texture Processing in Vision: Analysis and Synthesis in the Gaborian Space," I.E.E.E. transactions on Biomedical Engineering, vol. 36, no. 1, January 1989.

[32.] Reed T.R, and Wechsler H., "Segmentation of Textured Images and Gestalt Organization Using Spatial/Spatial-Frequency Representations," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 1, January 1990.

[33.] Russ J.C., The Image Processing Handbook, IEEE Press, 1995.

[34.] Smith J.R., and S.-F. Chang, "Single Color Extraction and Image Query," to appear in Proceedings of the IEEE International Conference on Image Processing (ICIP-95), October, 1995.

[35.] Smith J.R., and S.-F. Chang, "Tools and Techniques for Color Image Retrieval," Columbia University CTR Technical Report, June 1995.

[36.] Smith J.R., and S.-F. Chang, "Texture Classification and Discrimination in Large Image Databases," IEEE ICIP-94, Austin, Tx., November, 1995.

[37.] Swain M., and D. Ballard, "Color Indexing," International Journal of Computer Vision, 7:1, 1991, p. 11 -- 32.

[38.] Tominaga S., "A Computer Method for Specifying Colors by Means of Color Naming," in Cognitive Engineering in the Design of Human-Computer Interaction and Expert Systems, edited by G. Salvendy, Elsevier Science Publishers, 1987, p. 131 -- 138.

[39.] Tominaga S., "Color Classification of Natural Color Images," COLOR research and applications, Vol. 17, No. 4, August 1992, pp. 230 -- 239.

[40.] Turner M.R., "Texture Discrimination by Gabor Functions," Biol. Cybern., 55, 71-82, 1986.

[41.] Wilson R., and Granlund G.H., "The Uncertainty Principle in Image Processing," I.E.E.E. Transactions on Pattern Analysis and Machine Intelligence, vol. pami-6, no. 6, November 1984.

[42.] Wyszecki G., and W. S. Stiles, Color Science: Concepts and Methods, John Wiley & Sons, 1982.



FIGURE 1. Communication between the user and the database in the search for images. The user prefers to specify the query at the semantic level. Some domains allow query expression at the object level. In content-based retrieval systems the retrieval at visual feature level is supported.



FIGURE 2. Colorization of the Car color image in the first stage of extraction of color regions: (1) conversion to color space, , (2) quantization of space, , (3) color median filtering, (4) the processed color image has dominant color regions emphasized.



FIGURE 3. Quantized color space, 18 hues, 3 saturations and 3 values + 4 grays = 166 colors.



FIGURE 4. Color set extraction: (a) color image with 6 colors, (b.) single color regions -- only color regions corresponding to {a3}, {a0} and {a2} are larger than size threshold, ta = 322. (c.) 2-color regions -- only regions {a0,a1}, {a4,a5}, {a0,a3} and {a2,a3} are larger than size threshold ta, and for each region, each color in the respective sets contributes tb = 322 pixels and more than tg= 20% of region size. Notice that {a1,a3} is not extracted because a1 does not contribute more than tg % of region {a1,a3}. (d.) 3-color regions -- sizes of {a0,a1,a3}, {a2,a4,a5} and {a0,a2,a3} are greater than size threshold ta, and each color contributes more thantb pixels and tg % of region size. Also notice that regions of the type {a0,a1,a2} are not possible because the color elements are not connected.





FIGURE 5. Color region extraction for five color sets, with five possible colors (a) Butterfly color image, (b) colorized image with 30 colors, five of the colors are labeled, (c) pixels belonging to each of the five color sets `s. The color sets are listed at the bottom. Element i in each color set refers to the selection of color i from the list in (b), (d) minimum bounding rectangles (MBRs) for extracted regions used to index the regions.





FIGURE 6. Extraction of a multicolored region (a) San Francisco color image, (b) colorized image with 73 colors, (c) pixels that are red, white or blue, (d) the extracted red, white and blue color region corresponing to the flag is added to the index.



FIGURE 7. WWW demo using color set query http://www.ctr.columbia.edu/~jrsmith/advent/color_demo.html, (a) query formulation by picking color from 166 element color space, (b) image retrieval results for single color selection.



FIGURE 8. Color query interface tools (a) Color Space Navigator -- tool that allows users continuous navigation through 3-D color spaces. (b) Color Extractor -- tool for extracting color regions from images.



FIGURE 9. Color histograms (a) three channel color flower image -- Chrysanthemum carinatum `Monarch Court Jesters', (b) histogram for each channel in , (c) histograms. Dotted lines indicate average channel values.



FIGURE 10. Color set query -- retrieval effectiveness (a) linguistic color specification, (b) color swatches.



FIGURE 11. Retrieval effectiveness as a function of the number samples per color axis ( space), (a) color histogram distance -- Euclidean distance, (b) color histogram intersection.



FIGURE 12. Retrieval effectiveness as a function of the number of histogram bins per color axis ( space), (a) color histogram cross distance (b) best retrieval scores from intersection, distance and cross distance.



FIGURE 13. Retrieval effectiveness for color histogram retrieval using segments and whole images (a) 500 color images database, (b) 3000 color images database.



FIGURE 14. Color set and color histogram -- retrieval effectiveness (a) 500 color images, (b) 3000 color images.





FIGURE 15. Extraction of color and texture features from compressed image data, (a) original color image -- Buffalos image, (b) QMF wavelet transformed image, (c) subbands yielding color (c) and texture (ti) feature information.



FIGURE 16. Extraction of textures from filter bank, (a) Hi's are 2-D narrow band filters, (b) | . | measures spatial-frequency energy, (c) ti's are energy thresholds, (d) (NL) non-linear filtering, (e) spatial-frequency region (Si) size thresholds, (f) conversion to binary features and summation of subbands, (g) final (NL) non-linear filtering, (S) size threshold and (SL) sequential labeling of textures.



FIGURE 17. Texture extraction from the Barbara image, (a) wavelet image, (b) energy distributions in subbands, black = high energy, (c) after thresholding, (d) reassembled image with labeled textures, (e) texture regions identified.



FIGURE 18. Partial results of texture query on database of 500 images. Query texture set = [1 0 0 1 0 0 0 0 0], which defines textures with low and mid-range vertical frequencies. Boxes indiciate locations of textural patterns.



FIGURE 19. Combination of visual features, (a) color map with two color regions labeled C0 and C1, respectively, (b) texture map with two regions labeled T0 and T1 respectively, (c) combined color and texture map, regions have color and texture parameters (Ci, Tj).



FIGURE 20. Combination of (a.) color and (b.) texture maps at time of feature extraction. Each individual feature map is defined over binary feature space. (c.) The intermediate map is obtained by concatenating color and texture sets. Processing of intermediate map using (NL) non-linear filter and (S) region size threshold to eliminate insignificant regions produces (d.) the unified map. Binary feature set of the form (ci, tj) represents the color and texture content of each region. Note that the 0 set is used to denote absense of a color or texture feature. For example, the binary set (c0, 0) indicates that a region does not have a texture feature while, (0,t0) indicates that a region does not have a color feature.

I. Introduction 2

II. Visual Features 5

IIA. Content-Based Query 5

IIB. Integration of Visual Features 7

III. Color Sets 9

IIIA. Color Set Notation 10

IIIB. Color Region Extraction 11

IIIC. Color Space - Transformation and Quantization 12

IIID. Color Processing 13

IIID.1 Color region labeling 13

IIID.2 Color image mining 15

IIIE. Color Query -- Spatial Locations 16

IIIF. Color Query -- Color Specification 17

IV. Color Histograms 18

IVA. Color Histogram Definition 19

IVA.1 Color uniformity 20

IVB. Color Histogram Discrimination 20

IVB.1 Histogram euclidean distance 20

IVB.2 Histogram intersection distance 21

IVB.3 Histogram cross distance 21

V. Color Retrieval Experiments 21

VA. Retrieval Effectiveness 21

VA.1 Measures of retrieval effectiveness 22

VB. Color Set Retrieval Effectiveness 22

VC. Color Histogram Retrieval Effectiveness 23

VD. Comparison of Color Retrieval Techniques 24

VE. Color Set Retrieval Summary 25

VI. Texture Sets 26

VIA. Texture Features 28

VIA.1 Spatial/Spatial Frequency Features 29

VIA.2 Gabor Functions 30

VIA.3 Wavelet subband features 31

VIB. Texture Set Notation 32

VIC. Texture Feature Extraction 33

VID. Texture Set Retrieval 34

VIE. Texture Set Retrieval Summary 35

VII. Integration of Visual Features 35

VIII. Summary 37

IX. References 39

FIGURE 1. Communication between the user and the database in the search for images. The user prefers to specify the query at the semantic level. Some domains allow query expression at the object level. In content-based retrieval systems the retrieval at visual feature level is supported. 42

FIGURE 2. Colorization of the Car color image in the first stage of extraction of color regions: (1) conversion to color space, , (2) quantization of space, , (3) color median filtering, (4) the processed color image has dominant color regions emphasized. 43

FIGURE 3. Quantized color space, 18 hues, 3 saturations and 3 values + 4 grays = 166 colors. 43

FIGURE 4. Color set extraction: (a) color image with 6 colors, (b) extracted color regions: (A.) single color regions -- only color regions corresponding to {a3}, {a0} and {a4} are larger than size threshold, ta=322. (B.) 2-color regions -- only regions {a0,a1}, {a4,a5}, {a0,a3} and {a2,a3} are larger than size threshold ta, and for each region, each color in the respective sets contributes tb=322 pixels and more than tg=20% of region size. Notice that {a1,a3} is not extracted because a1 does not contribute more than tg % of region {a1,a3}. (C.) 3-color regions -- sizes of {a0,a1,a3}, {a2,a4,a5} and {a0,a2,a3} are greater than size threshold ta, and each color contributes more thantb pixels and tg % of region size. Also notice that regions of the type {a0,a1,a2} are not possible because the color elements are not connected. 44

FIGURE 5. Color region extraction for five color sets, with five possible colors (a) Butterfly color image, (b) colorized image with 30 colors, five of the colors are labeled, (c) pixels belonging to each of the five color sets `s. The color sets are listed at the bottom. Element i in each color set refers to the selection of color i from the list in (b), (d) minimum bounding rectangles (MBRs) for extracted regions used to index the regions. 45

FIGURE 6. Extraction of a multicolored region (a) San Francisco color image, (b) colorized image with 73 colors, (c) pixels that are red, white or blue, (d) the extracted red, white and blue color region corresponing to the flag is added to the index. 46

FIGURE 7. WWW demo using color set query http://www.ctr.columbia.edu/~jrsmith/advent/color_demo.html, (a) query formulation by picking color from 166 element color space, (b) image retrieval results for single color selection. 46

FIGURE 8. Color query interface tools (a) Color Space Navigator -- tool that allows users continuous navigation through 3-D color spaces. (b) Color Extractor -- tool for extracting color regions from images. 47

FIGURE 9. Color histograms (a) three channel color flower image -- Chrysanthemum carinatum `Monarch Court Jesters', (b) histogram for each channel in , (c) histograms. Dotted lines indicate average channel values. 47

FIGURE 10. Color set query -- retrieval effectiveness (a) linguistic color specification, (b) color swatches. 48

FIGURE 11. Retrieval effectiveness as a function of the number samples per color axis ( space), (a) color histogram distance -- Euclidean distance, (b) color histogram intersection. 48

FIGURE 12. Retrieval effectiveness as a function of the number of histogram bins per color axis ( space), (a) color histogram cross distance (b) best retrieval scores from intersection, distance and cross distance. 49

FIGURE 13. Retrieval effectiveness for color histogram retrieval using segments and whole images (a) 500 color images database, (b) 3000 color images database. 49

FIGURE 14. Color set and color histogram -- retrieval effectiveness (a) 500 color images, (b) 3000 color images. 50

FIGURE 15. Extraction of color and texture features from compressed image data, (a) original color image -- Buffalos image, (b) QMF wavelet transformed image, (c) subbands yielding color (c) and texture (ti) feature information. 50

FIGURE 16. Extraction of textures from filter bank, (a) Hi's are 2-D narrow band filters, (b) | . | measures spatial-frequency energy, (c) ti's are energy thresholds, (d) (NL) non-linear filtering, (e) spatial-frequency region (Si) size thresholds, (f) conversion to binary features and summation of subbands, (g) final (NL) non-linear filtering, (S) size threshold and (SL) sequential labeling of textures. 51

FIGURE 17. Texture extraction from the Barbara image, (a) wavelet image, (b) energy distributions in subbands, black = high energy, (c) after thresholding, (d) reassembled image with labeled textures, (e) texture regions identified. 51

FIGURE 18. Partial results of texture query on database of 500 images. Query texture set = [1 0 0 1 0 0 0 0 0], which defines textures with low and mid-range vertical frequencies. Boxes indiciate locations of textural patterns. 52

FIGURE 19. Combination of visual features, (a) color map with two color regions labeled C0 and C1, respectively, (b) texture map with two regions labeled T0 and T1 respectively, (c) combined color and texture map, regions have color and texture parameters (Ci, Tj). 52

FIGURE 20. Combination of (a.) color and (b.) texture maps at time of feature extraction. Each individual feature map is defined over binary feature space. (c.) The intermediate map is obtained by concatenating color and texture sets. Processing of intermediate map using (NL) non-linear filter and (S) region size threshold to eliminate insignificant regions produces (d.) the unified map. Binary feature set of the form (ci, tj) represents the color and texture content of each region. Note that the 0 set is used to denote absense of a color or texture feature. For example, the binary set (c0, 0) indicates that a region does not have a texture feature while, (0,t0) indicates that a region does not have a color feature. 53