Abstract
Optical character recognition (OCR) has been a topic of interest since possibly the late 1940's when Jacob Rabinow started his work in the field1. The earliest OCR machines were primitive mechanical devices with fairly high failure rates. As the amount of new written material increased, so did the need to process it all in a fast and reliable manner, and these machines were clearly not up to the task. They quickly gave way to computer-based OCR devices that could outperform them both in terms of speed and reliability.
Today there are many OCR devices in use based on a plethora of different algorithms. All of the popular algorithms sport high accuracy and most high speed, but still many suffer from a fairly simple flaw: when they do make mistakes (and they all do), the mistakes are often very unnatural to the human point of view. That is, mistaking a "5" for an "S" is not too surprising because most people are willing to agree that these two characters are similar, but mistaking a "5" for an "M" is counter-intuitive and unexpected. Algorithms make such mistakes because they generally operate on a different set of features than humans for computational reasons. Humans observe strokes and the relations between them, while algorithms measure anything from Transformation Ring Projections2 of a character to the Fourier Transform of the Horizontal-Vertical Projections3 of a character. These methods do work and are often computationally efficient, but they make the computer see letters through a decidedly non-human set of eyes.
The importance of making the same sorts of mistakes as a human may not be immediately obvious, but it is important to realize that the main purpose of OCR is to facilitate communication between humans. Mistakes typical of humans can be more readily corrected by humans (i.e. "5ave" is easier to connect with "Save" than "Mave").
This paper describes an algorithm that attempts to work with a subset of the features in a character that a human would typically see for the identification of machine-printed English characters. Its recognition rate is currently not as high as the recognition rates of the older, more developed character recognition algorithms, but it is expected that if it were expanded to work with a larger set of features this problem would be removed. If it were expanded to use more features, it would be made correspondingly slower; with the advent of faster microprocessors this fact is not viewed as a crippling problem.
The characters in most Western character sets can comfortably be described using only an eight by eight grid. Since the focus of this algorithm is currently just machine printed English characters the algorithm was designed to handle eight by eight data. It should not be impossible to use this algorithm on characters with higher resolution, but some method of data reduction would have to be applied to the raw data first. A block reduction method should produce acceptable results for most machine printed data, but there is no reason that a good thinning algorithm4 could not do just as well provided that the input needs of this algorithm are properly understood.
The algorithm presently avoids thinning (and other preprocessing) by assuming that the input eight by eight data is not particularly aberrant -- lines in an eight by eight grid should not normally be thicker than two pixels. With this assumption, it then proceeds to look for feature points.
A feature point is a point of human interest in an image, a place where something happens. It could be an intersection between two lines, or it could be a corner, or it could be just a dot surrounded by space. Such points serve to help define the relationship between different strokes. Two strokes could fully cross each other, come together in a "Y" or a "T" intersection, form a corner, or avoid each other altogether. People tend to be sensitive to these relationships; the fact that the lines in a "Z" connect in a certain way is more important than the individual lengths of those lines. These relationships are what should be used for character identification, and the feature points can be exploited for the task.
The procedure for extracting these feature points utilized by this algorithm is fairly straightforward. Since an eight by eight character consists of only sixty-four pixels, it is viable to simply loop through the entire character and examine each pixel in turn. If a pixel is on, its eight neighbors are checked. Since each neighbor can also only be on or off, there are merely 256 possible combinations of neighborhoods. Of these 256, fifty-eight were found to represent significant feature points in a fairly unambiguous way. Extracting feature points thus reduced to calculating a number between zero and 256 to describe a pixel's neighborhood and then comparing that number against a table of known feature points (see Table 1, "Enumeration of Possible Pixel Neighborhoods," ). While it is true that this method does not always catch every feature point (some can only be seen in a larger context) it catches the majority. Missing feature points is certainly not a limiting factor in the algorithm's accuracy. It also does not suffer from labelling too many uninteresting points as being feature points. It has virtually no false positives. The feature point extractor is thus fast and reliable.
+
+ *
+ **
*+
*+ *
*+ **
* +
* + *
* + **
* *+
* *+ *
* *+ **
** +
** + *
** + **
** *+
** *+ *
** *+ **
* * +
* * + *
* * + **
* * *+
* * *+ *
* * *+ **
*** +
*** + *
*** + **
*** *+
*** *+ *
*** *+ **
+*
+* *
+* **
*+*
*+* *
*+* **
* +*
* +* *
* +* **
* *+*
* *+* *
* *+* **
** +*
** +* *
** +* **
** *+*
** *+* *
** *+* **
* * +*
* * +* *
* * +* **
* * *+*
* * *+* *
* * *+* **
*** +*
*** +* *
*** +* **
*** *+*
*** *+* *
*** *+* **
+ * *
+ ***
*+ * *
*+ ***
* + * *
* + ***
* *+ * *
* *+ ***
** + * *
** + ***
** *+ * *
** *+ ***
* * + * *
* * + ***
* * *+ * *
* * *+ ***
*** + * *
*** + ***
*** *+ * *
*** *+ ***
+* * *
+* ***