Home
Contact
Sitemap
Bookmark us
Home
Services
Products
eCommerce
eFinance
Company
Scaling
In general, the numerical value for a feature x depends on the units used, .i.e., on the
scale
. If x is multiplied by a scale factor a, then both the mean and the standard deviation are multiplied by a. (The variance is multiplied by a
^{2}
.)
Sometimes it is desirable to scale the data so that the resulting standard deviation is unity. This is easily done: just divide x by the standard deviation s. Similarly, in measuring the distance from x to m, it often makes sense to measure it relative to the standard deviation. The so-called
standardized distance
from x to m is given by
.
Note that r is invariant to translation and invariant to scale. This suggests an important generalization of a minimum-Euclidean-distance classifier. Let x(i) be the value for Feature i, let m(i,j) be the mean value of Feature i for Class j, and let s(i,j) be the standard deviation of Feature i for Class j. In measuring the distance between the feature vector
x
and the mean vector
m
_{j}
for Class j, suppose that we use the standardized distance
.
This distance has the important property that it is
scale invariant
. That is, if we measure distance in this way, the units we use for the various features will have no effect on the resulting distances, and thus no effect on the final classification.