| The rapid development of new technologies brings us a large amount ofinformation every day, the most important step to obtain useful information fromthe massive data is pattern recognition technology. In the pattern recognitionsystem, how to avoid "dimension disaster" and choose a appropriate featuresubset from the original data set without affecting the performance ofclassification make feature selection algorithm facing great challenges due to thehuge data size and its small sample and high dimension. Feature selection is avery important part of pattern recognition system, and also the premise andessential for designing a good classifier.The author studies feature selection algorithms from others at home andabroad then presents two improved feature selection algorithms on the base ofstudy in depth into the evaluation measures, the search directions and the searchstrategies of supervised feature selection algorithm.Multilayer filter feature selection algorithm based on PCA (PrincipleComponent Analysis) applies PCA feature extraction before feature selection,this can remove redundant between characteristics effectively and overcome thehigh computation of feature selection when it is used to detect the redundant ofdata set with high reliance. Then studying the nonlinear correlation based onminimum redundancy and maximum relevancy through introducing the theory ofinformation entropy. Filter feature selection can get high efficiency but it can’tensure to get the smallest feature subset. So this paper uses multilayer filterfeature selection algorithm to reduce the amount of calculation and reduce thefeature dimension step by step and to get a small feature subset with minimumredundancy.The embedded dynamic feature selection algorithm based on informationcorrelation is on the basis of this: measures of information correlation are built on the basis of the theory of probability, firstly we need to know theprobability distribution of the data set. With the conducting of feature selection,the waiting feature subset is shrinking and the selected feature subset is growing,so the uncertainty of categories is getting smaller, however the informationentropy is still the same, it is clear that the information entropy contains partof "false information". Data samples can be identified and be removed from theoriginal sample according to the selected feature subset by improving theevaluation function in feature selection and embedding k-nearest neighborclassifier in feature selection algorithm. |