Font Size: a A A

Pattern Recognition Method And Application Of Research Based On Default Data Sets

Posted on:2012-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y D SongFull Text:PDF
GTID:2218330332991526Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In pattern recognition, machine learning, and data mining, classification is a basic and important problem. As one of the research methods of pattern classification, support vector machine (SVM) technique is also getting extensive research and application recently. In classification, there are two necessary conditions, one is the classifier, and the other is the data set. In practical applications, due to the differences of the channels to obtain data and the method of modeling data, a large amount of the collected information is often incomplete, or unbalanced. For the classification of data sets which lack of certain characteristics, we call it defalt datasets .the current methods are mainly either delete the missing features completely or replacing the missing eigenvalues with the mean value. For the classification of unbalanced datasets, the traditional method usually involves artificially resampling the minor class or delete some of the more kind to weaken the unbalancedness of the training samples, but such methods in a certain extent will reduce the classification accuracy, and cannot make algorithm improvement fundamentally. At the same time, all correction methods mentioned above cannot avoid the influence of subjective factors on the original system, and their costs are very high. The exploring of classification algorithms on data sets with missing eigenvalues (the incompleteness of the data) is a relatively new problem along with the development of data acquisition, machine learning and information retrieval. Mature research results can be found in neither home nor abroad. However, such researches have very realistic significance and broad application prospect, especially in the fields of license plate recognition, phonetic recognition, biological authentication, medical diagnosis and machine fault detection which are more likely to suffer feature missing.In this thesis, by reviewing and discussing the existing theories and algorithms on both feature-missing data sets and unbalanced data sets, we propose a new algorithm which is based on support vector machine, and the new algorithm is proved excellent according to experiments. The main jobs in this thesis are as follows:The first part is as the introduction of the whole thesis, in which the development of pattern recognition technology and the current situation of classification on feature-missing data sets are analyzed and summarized.The second part briefly summarizes the theoretical knowledge of support vector machine, including the basic problems of machine learning, statistical learning theory, the basic algorithmic principles of support vector machine etc.In the third part, according to the characteristics of classifying feature-missing data sets, the definition, causes and countermeasures of feature missing are discussed and analyzed. Besides, current research results on classifying data sets with missing features are analyzed too. Then this thesis puts forward a support vector machine (SVM) method to deal with incomplete data, which is based on the maximum-margin and minimum within-class variance, and conducts simulation experiments with the data in UCI database.The fourth part mainly deals with the problem of unbalanced data sets classification. A brief introduction is given on the definition, characteristic and traditional processing method of unbalanced data sets. On the basis of some present research achievements, combing with the traditional one class SVM and two classes classification algorithm, we propose a maximum-margin SVM method based on a small amount of abnormal training data, and bring in the margin between the hyperplane and positive class and the margin between hyperplane and the negative class so as to realize effective novelty detection with a small amount of abnormal data. Finally, the proposed method is validated by experiments on the data of medical diagnosis, fault detection and some other areas.The last part summarizes this thesis, and gives an expectation for the future development.
Keywords/Search Tags:pattern recognition, support vector machine (SVM), default datasets, feature absence, unbalanced data sets, within-class variance, maximum-margin, novelty detection
PDF Full Text Request
Related items