The main motivations behind the place localization research work assisted by color vision were to determine the efficiency of color as an identifier and to carry out a scene analysis task based upon information conveyed by the reflectance measurements of the input image regions. Our research project also aimed to investigate how powerful the surface reflectance values proposed by Land were in natural, three dimensional environment settings. Although the investigation has not been able to address all the complexities of these problems, here is a brief summary about the achievements and some ideas for improvement and future studies.
Our research project introduces a linearization process that allows for a standard camera calibration. That procedure is applied at the very beginning of any image processing tasks, in order to eliminate distortions in the input image that can be attributed to the input device, the color camera. The presence of a reference 3-step or 15-step gray card is not required in the inputs. The process is working with globally stored variables measured for a standardized gray card that has three different types of surface regions reflecting 3, 18 and 90% of the incoming light. It was selected instead of the 15-step version, mostly because the size of the references allowed for more precise measurements when placed into a natural image setting. According to the experiments, the linearization process functions properly. The dimmed, gray original images are transformed to be brighter and more vivid. This representation is much closer to the real scene that humans perceive than the initial ones. Naturally, it was not merely by sight that the functionality of the process was tested. The color intensity histograms were shifted back from their extreme position in the upper half of the [0, 255] interval, and the color intensity values of the reference card provided correct measures. (In all three color-bands, the intensity indicators belonging to the standardized gray card regions took the same value (+/-e ). The closer the illumination settings were to the reference lighting conditions, the more precise outcomes were produced even without the color transforms..)
After the intensity values of the input image are corrected, the RGB color triplets are transformed into reflectance values. The algorithm that carries out this procedure was inspired by Edwin Lands "Retinex Theory". The original version of the reflectance finding code had to be slightly modified in order to obtain meaningful outcomes for all types of images. In theory, the reflectance search can be started at any location of the image as the final results are all normalized to the largest value. The practical implementation, however, is restricted by truncation errors. In certain cases the final normalization process resulted in loss of information, as some of the indicators became too small. To prevent that situation from happening, the algorithm was forced to begin its operation at the approximately lightest image element. That adjustment allowed the reflectance finding procedure to become reliable.
In order to carry out a segmentation task based upon the new reflectance data, the original Brice-Fennema region finding algorithm was modified. Instead of grayscale intensity values, it currently carries out its calculations based upon the surface reflectance triplets.
As the global linearization measures correct for the camera distortions with respect to a particular illumination setting, the reflectance values have to be modified according to the presumed lighting effects in their own environment. The transforms that are responsible for that adjustment are called the color identification transforms. They constitute a fundamental part of the localization algorithm. Besides implicitly approximating the actual lighting settings in the input image scene, they are also to determine the color of the recognized surfaces from the transformed reflectance values. Their current search for the appropriate colors is restricted to the ones represented in the environment model. Considering the basic assumptions of the algorithm, the transforms could be characterized as fairly good ones. Although the most recent "perfect" identification rate of it was 57% (8/14), the mismatches were not serious. In the environment model earth colors dominate and it proved to be extremely difficult to differentiate between some delicate shades of it. One of the best examples is the cork brown color of the bulletin boards. The transforms hardly ever found the reflectance value associated with these surfaces. The color values of the timber doors also provided a difficult identification decision. The hue values of these surfaces lie close to each other, hence the transforms could easily mistake them for each other. Also, the low performance rate indicated above can be explained by the nature of the testing set of images. The 14 images were randomly selected. They, despite the main assumptions of the algorithm, contained examples of shadows, interreflections and unknown objects that are all factors that have not been accounted for in the operations. Considering all these facts the identification procedures did well.
A consistently good performance is required from the identification operations as the localization procedure assumes all the outcomes to be correct. All its searches for possible locale and scene candidates are based upon that supposition. The localization algorithm has not yet been completed. Following a bottom up approach, it currently collects low-level information about all the regions in the environment model that have one of the examined colors from the input. These matches then are evaluated in relationship to the other matched regions. In some cases, it is possible to identify an incorrect color assignment at this stage by carefully analyzing the information embedded in the environment model. An example of this is the node tree structure that contains information about the relationship of surfaces in the vertical direction. The restrictions imposed by this tree are currently very general and cannot provide sufficient amount of additional guidance in recognizing scenes.
Although it is only partial results that are available from the localization procedure, it can be determined that a top down approach might be more successful with the given tasks. On the bottom level, when just relying on the reflectance information, it seems to be an extremely complex task to make any type of conclusions about the location of regions in the environment. The top down approach has not been implemented yet.
In conclusion, it can be stated that the research project found an answer to several of the questions that motivated its creation. Color, which plays an essential part in the human visual experience could be effectively applied to machine vision. However, color perception is influenced by an immensely large set of factors. Some of these factors are illumination, texture, and environment. Artificial experiments with pure colors or controlled light settings can achieve good results, but with natural colors and natural surroundings the task becomes extremely complex. That high dimensionality of unknown variables must have been the main reason for neglecting surface color properties in earlier research projects.
It should be noted that the identification of some surfaces is particularly difficult given their three-dimensional structure. It is an extremely challenging task, for example, to recognize the mailbox shelves or the bookshelves with glass front doors in the environment. Although the color of the wood that these bodies are made out of can be registered, these regions are hardly ever recognized by the recognition algorithm and/or the localization procedure. The problem lies in the fact that on the two-dimensional input the thin shelves are separated by dark holes or books. This "noise" disrupts the continuity of the wooden surfaces. The region analysis, in this way, is not able to group all the member pixels of the same surface together. As such details, the exact structure of the mailboxes or the content of the bookshelves are not even represented in the model the identification procedure cannot be guided by heuristics either. With the given environment model and region descriptions this limitation of the algorithm cannot be eliminated.
6.2 Discussion
6.2.1 Ideas for Improvement and Future Work
The immense complexity of the vision problem should not halt future experiments. There have been a lot of improvements in understanding color perception in Psychology, Biology and Computer Science, which allows for newer and newer implementations of some of the above proposed problems. In the following, some intriguing problems and challenges are described that could attribute to the color and scene identification algorithm presented in this thesis work.
6.2.1.1 Hue-Saturation-Intensity Calculations
In Chapter 2, the Hue Saturation Intensity color measurement system has already been referred to. Each color sensation in this model is described by three terms. Hue refers to the wavelength of light that is reflected or transmitted from the object. More generally, this is the component that signifies the perceived color, such as red, yellow and green. In the HSI color space [Figure 30], hue also represents the angle about the intensity axis where its value ranges between 0 and 2*PI. Saturation stands for the strength of the hue component. It indicates the amount of whiteness in the color and is evaluated on [0, 1.0]. The third component, intensity is also measured between 0 and 1.0. It demonstrates the lightness property of the perceived color.
For humans, this measurement model provides a more intuitive way to discuss their visual experiences than the RGB color space. It is easier to reflect differences between shades of colors or different colors using these three rather than the red, green and blue indicators. That was the reason why, we also started to test the color classification features of this model.
Our first task was to find a conversion between the RGB intensity measures and HSI color measures. The formulas used for these operations were introduced in [Ballard, 1982]:
These were first applied to the reflectance values that are stored in the environment model. As the new descriptor triplets demonstrate it on Figure 31, the hue values take widely different values. That feature suggests that distinguishing between the model colors should heavily rely on these evaluators of the triplet. For example, a weighted-sum with "heavier" weight on the hue component of the triplets could be applied.
Preparing the color transform methods, a slightly modified version of the previous algorithms was used. Only the distance calculations had to be modified. It is important, that in case of the hue component, distance has to be measured on a circle. It means that its maximum value is PI. If a distance (d) exceeds that limit, it has to be replaced by (2*PI-d).
The very first experiments indicated complexities in interpreting the effect of the color transform multipliers. Their original role is to simulate different lighting conditions in the viewing environment. It is however, not trivial, how this operation would alter the HSI descriptors. It is true that a transform multiplication modifies all the components of the new representation, but the rate of this change is difficult to be predicted. Due to lack of sufficient amount of time the investigation of that topic had to be postponed. It would be, however, a truly intriguing topic for future research study projects to address.
6.2.1.2 Neural Networks
When deciding about different strategies to apply in case of the linearization, color identification, distortion and localization problems the idea of neural networks considered. Neural networks have been successfully applied in a wide-ranging collection of classification and identification problems. They have addressed questions from navigation tasks to image understanding. One such example is summarized in [Leow, *]. This paper provides a description of a double neural network system that processes input images. The authors addressed the questions of how a finite neural network could process infinitely large amount of information and how a neural network could represent and apply structured knowledge (schemas). The algorithm worked sequentially on small portions of the image. It maintained partial interpretation of the image and then a globally consistent interpretation was formed.
The presented program only worked in case of simplified/ restricted image scenes. Could this system be further advanced to function within a less restricted environment? In case of a color identification procedure would it function better than just standard mathematical computations? Could a neural network system be trained to recognize that objects are occluded or are in shadows? These are all fascinating questions that are open for investigation.
6.2.1.3 An Extended Locale Model
Another property of the system that would be excellent to be improved relates to the enrichment of the locale descriptor values that are associated with surfaces in the model. Currently, it is only the Lambertian reflectance measures that are utilized in the identification and search processes. They performed well, however, hardly any surface can be classified as having 100% Lambertian properties. Most of the objects have both matte and mirror-like surface properties. Introducing specularity would not only allow for a more precise object description but would also allow the system to efficiently treat windows, glass doors and mirror-like surfaces. Perhaps, the amount of distortions due to the reflection phenomenon on the input images could also be reduced.
The current locale model does not include the description of light sources and the algorithm does not attempt to make any assumption about their location. This information, however, could also provide essential details about the nature of distortions and clues about the set of frames that are tested for possible locale matches. With Lambertian and specular reflectance values registered in the environment model, a wide variety of new issues are expected to be able to be addressed. Glass objects and shiny surfaces could be more efficiently handled, reflection effects could be accounted for and the location of light sources might be estimated.
6.2.1.4 Unknown Objects?
The problem of encountering unknown objects in the environment is not addressed in the algorithm. The agent would not be able to decide whether in case of a false image match, the mistake originated from computational errors or it simply encountered a yet unseen body. Having the algorithm to force all regions to be paired up with a component of the environment model can also be error-prone. Especially if the environment is so rapidly changing as the public areas of Clapp Laboratory! One solution to that problem could be to consider the option of dropping the surface with the lowest identification confidence from the localization process thus forcing all the examined areas to be necessarily identified. Another approach could increase the set of color terms that are described in the system. A larger collection of colors would be represented in the general knowledge base of the agent even if they did not all belong to a surface in the model. If the examined surface color is matched to one of the "outlier" (not used in locale description) measures, then the agent would know that it couldnt possibly be identified as an environment surface based on the pre-determined knowledge-base information. If that strategy is followed, however, the implementation should reconsider some of the earlier assumptions that were formed and applied about the nature of the reduced set of color terms.
6.2.1.5 Top-Down Space Identification
Implementing a top-down strategy in the localization phase could also improve the performance of the system. As it was explained in Chapter 5, guided heuristics would be able to place the emphasis on more distinguishable features of individual sub-locales and would be able to address the problem of overdivision or underdivision by the segmentation algorithm. One significant question that should also be examined more extensively addresses the following. If there are two regions recovered from the same face and they have the same reflectance colors assigned to them, should it be assumed that they belong together or shall they be treated differently.
6.2.1.6 More Efficient Neighbor Search
Throughout the scene identification calculations, the confidence value of a face with identified regions is increased if the face has any other immediate neighbors that also contain regions of the searched colors. Currently, this test is carried only in case of faces that reside in the same locale. That condition is limiting and would be extremely useful to be eliminated. The majority of the "movable" objects in the environment model are described in a separate locale. This practice allows for easily changing the location of them with respect to the parent locales. However, they are currently not compared to their current neighboring surfaces in the model. To carry out the required modification, the coordinate system of the locale has to be transformed into the parents one. That can be accomplished by an inverse transform given the pose (position and orientation) of the sub-locale in the parent one.
6.2.1.6 Confidence Value Assignments
Last, but not least it would be essential to develop a more efficient confidence measuring system. The confidence measurement calculations could be modified in a way that they conveyed more information about the nature of matches and mismatches in the different stages (color transform, locale search, etc) of the place localization algorithm. The current partial indicators do not incorporate all available information about the operations, and therefore, they are not stable enough. They should be accumulated and continuously modified throughout the entire algorithm, beginning with the color transformations through to the final steps of locale matches, in order to produce more reliable predictions about the input environment.