Font Size: a A A

Research On Pattern Classifier Based On Large-Scale Datasets

Posted on:2009-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChangFull Text:PDF
GTID:2178360245486440Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the process of pattern classification on large-scale datasets , both redundant features and many training samples will lead to low classification speed and require high performance of computer memory.So it's necessary to process the datasets using feature selection and sample selection before pattern classification,discard the redundant features from the datasets and select the samples deciding the nonlinear separating surface of the classifier.Then train the classifier using the simplified training datasets to enhance the classification accuracy and reduce the computer memory requirement.Orthogonal design and uniform design are two widely used experimental design methods.Both can find out the optimal combination of factors with fewer experiments . Besides dealing with small samples , Support Vector Machine(SVM) has a good generalization ability and is immune to the restriction of the data dimension.Considered the listed advantages of the three kinds of theory,the thesis takes SVM as a classifier and proposes two feature selection methods,which are feature selection based on orthogonal design and feature selection based on uniform design.Training and testing are arranged according to the feature numbers of datasets and the structure of orthogonal table or uniform table.Finally,experiments are carried out on the selected subsets of features.The results indicate that the proposed methods can not only discard the redundant features but also gain better classification accuracy than that on the datasets with full features.One modified algorithm about SVM is Reduced Support Vector Machine(RSVM).It uses a very small random subset of the dataset as support vectors to sovle unconstrained optimization problem and construct the nonlinear separating surface.Compared with the constrained nonlinear programming problem of solving the original SVM,it cuts the computational difficulty and computation time,reduces the computer storage requirement,and besides,its performance is better than the standard SVM to some extent.However,because of the random samples lack of representative characteristic,the results are unstable.In this thesis,an effective method is proposed.Firstly,find out the optimal clustering numbers of each kind of datasets using subtractive clustering,then select the samples belonging to each clustering center of each kind by FCM method,extract some samples and apply them to RSVM algorithm to get the Modified Reduced Support Vector Machine(MRSVM) algorithm in order to enhance the classifier's stability.The simulation results indicate that the time running the program is less,training error and testing error are also smaller than that of using original RSVM on the same dataset.
Keywords/Search Tags:feature selection, sample selection, orthogonal design, support vector machine, fuzzy clustering
PDF Full Text Request
Related items