Nearest Neighbor (pattern Recognition) Article Index for
Nearest
Website Links For
Neighbor
 

Information About

Nearest Neighbor (pattern Recognition)




In the algorithm, each feature is assigned a dimension to form a multidimensional feature space. A training set of objects with '' A Priori '' known class are processed by feature extraction and plotted within the multi-dimensional feature space. The Offsets in each dimension are referred to as the Feature Vector . This is the training or learning stage. Because the engine can be retrained to classify various phenomena, pattern recognition is part of Machine Learning .

The testing phase begins with phenomena to be classified (the class not being known ''a priori'') and extracts the same set of features. The geometric distance is computed between the new feature vector and each apriori feature vector from the training set. The shortest distance thus computed is to the nearest neighbor. The ''a priori'' class of the nearest neighbor is now assigned to the phenomena to be classified.

Obviously, this algorithm will be more computationally intensive as the size of the training set grows. Many optimizations have been given over the years; these generally seek to reduce the number of distances actually computed. Some optimizations involve partitioning the feature space, and only computing distances within specific nearby volumes.

Other variations of the algorithm include the ''k''-nearest Neighbor Algorithm where several of the nearest feature vectors are computed, and the classification is made with the highest Confidence only if all of the nearest neighbors are of the same class.

Nearest neighbor has some strong Consistency results. As the amount of data approaches infinity, nearest neighbor is guaranteed to yield an error rate no worse than twice the Bayes Error Rate (the minimum achievable error rate given the distribution of the data). ''k''-nearest neighbor is guaranteed to approach the Bayes error rate, for some value of ''k'' (where k increases as a function of the number of data points).

Computing the nearest neighbor of an input can be challenging for large data sets. Several different types of nearest neighbor finding algorithms include:
  • Linear scan

  • Kd-trees

  • Balltrees

  • Metric trees

  • Locality-Sensitive Hashing (LSH)




SEE ALSO



REFERENCES

  • Belur V. Dasarathy, editor (1991) ''Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques'', ISBN 0-8186-8930-7

  • ''Nearest-Neighbor Methods in Learning and Vision'', edited by Shakhnarovish, Darrell, and Indyk, The MIT Press, 2005, ISBN 0-262-19547-X