Font Size: a A A

Research And Application On Several Basic Problems Of Pattern Classification

Posted on:2016-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1108330488457657Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Pattern classification is a fundamental research direction in pattern recognition, with extensive research and application background. After several decades of rapid development, pattern classification has already come into many disciplines and made considerable progress in various fields. Up to now, these classification methods have made fruitful results and are widely used in real world application. However, there are still many fundamental issues worthy of further research and exploration, e.g. algorithm applicability, learning performance and identification efficiency. Therefore, the paper mainly discusses several fundamental classification issues, such as classification algorithm design, data reduction, increasing learning and algorithm ensemble. Some novel algorithms and technologies are proposed and applied in UCI datasets and specific practical application datasets. The corresponding experimental results indicate the effectiveness and feasibility of our research. The main contributions of this dissertation are as follows:1. To deal with the inherent disadvantages of K-Nearest Neighbor classification algorithm(KNN), such as the ignorance of the important influence of the dataset distribution, the sensitivity to interference and the intolerant running consumption, the new classification algorithm and the similarity measure criteria are discusses and studied. Then two improved nearest neighbor classification algorithms are proposed. The class-dimensional pattern storage strategy is adopted in the first proposed algorithm. The new storage strategy breaks the integrity of the patterns and converts the storage model of the training dataset. To a random unlabeled pattern, the class-dimensional similarities are calculated to obtain its class similarities based on these class-dimensional neighborhoods. Then the class label of the unlabeled pattern is same as that of the highest class similarity. The algorithm not only improves the classification efficiency and the adaptability to all kinds of pattern distribution, but also has the processing advantage to solve the continuous and categorical classification simultaneously. The algorithm expands the application range.In addition, aimed at the ignorance of the influence between the individual pattern and the whole pattern set exerted on traditional distance and similarity measurements, an improved similarity measurement strategy is proposed. Some inherent information of the training dataset, such as the pattern distribution density, the relationship among individuals, etc., isstudied. Bases on the studied content, a new affinity distance function is proposed. The affinity similarity function using this affinity distance function is proposed and used as the classification measure criterion in the second improved KNN. Theoretical analysis and experiments show the similarity function is an effective similarity strategy for classification, but the ensemble learning among the second improved KNN algorithm and some efficient indexing algorithms can reduce the classification time for large-scale dataset.2. In order to overcome the sequence sensitivity and the noise sensitivity in the prototype selection process of the condensed nearest neighbor algorithm(CNN), two new prototype selection algorithms are proposed based on two aspects. The first algorithm is based on local mean and class global information. In the prototype selecting process, the algorithm makes full use of those local means of the k heterogeneous and homogeneous nearest neighbors to each be-learning pattern and the class global information. The new updating strategy is adopted to achieve the dynamic updating target of the prototype set.In addition, by a detailed analysis of the prototype learning rule, an improved prototype selection algorithm based on adaptive boundary approximation is proposed. The main ideas of the proposed algorithm are:the homogeneous prototype absorption strategy of CNN is improved in the prototype selecting process. The class boundary area can be approximated gradually by retaining the closer homogeneous boundary prototype than its current nearest one and the boundary prototype can be obtained. Meanwhile, the prototype updating strategy is built which can achieve dynamic periodic updating to the prototype set and reduce the scale of the prototype set. The experiments show the two proposed algorithms can obtain the higher representative prototype set and better overcome the impact of the above two sensitivities.3. The pattern location decides the different classification contribution. This means the most internal patterns with little classification contribution can be removed and the boundary patterns that can contribute to better classification accuracy may be retained. To achieve this purpose, a novel algorithm based on binary tree technique and some reduction operations is presented. Firstly, we utilize several tree control rules and k nearest neighbor rule to build a binary nearest neighbor tree of each random pattern. Secondly, according to the node locations in each binary nearest neighbor tree and the strategies of selection and replacement, different kinds of patterns as prototypes are obtained, which are close to class boundary regions, locate in the interior regions or outliers, and some internal patterns are generated. Finally, experimental results show that the proposed algorithm is robust, can obtain the higher and smaller representative prototype, and can reduce the number of these redundant prototypes by embedding with other prototype algorithm.4. The major difficulties of the nearest neighbor classification algorithm dealing with large datasets are its intolerable running cost and the lacking of the incremental learning ability. To reduce the running cost and realize the incremental learning target, the historical and unconcerned information such as the pattern density, the classification error ratio, the neighborhood radius in the prototype generating process is utilized and studied. Therefore, based on the single-layer competitive learning of the incremental learning vector quantization network, a new incremental learning vector quantization method that merges together pattern density and classification error rate is proposed. Adopting a series of new competitive learning strategies, the proposed method can obtain the incremental prototype set from original training set quickly by learning, inserting, merging, splitting and deleting these representative pattern neighborhoods adaptively. The proposed method can achieve the higher reduction efficiency that can be in guaranteeing the higher classification accuracy synchronously for large-scale dataset. In addition, the classical nearest neighbor classification algorithm is improved by absorbing the pattern density and the classification error ratio of the final prototype neighborhood set into the classification decision criteria. The new classification decision criteria more fulfill the real practical situation. Experimental results show that the proposed algorithm has not only the characteristics of fast and incremental, but also good generality.
Keywords/Search Tags:Pattern recognition, K-nearest neighbor algorithm, Classification decision criteria, Data reduction, Incremental learning, Learning vector quantization
PDF Full Text Request
Related items