Font Size: a A A

Design And Implementation Of A Super Multi-class Classification

Posted on:2014-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2248330395496760Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
People can get the amount of data with the development of technology and the popularityof the network. Most data is in the form of the text. The network has become an essentialintegral part of life in people’s lives. People get information of the news, video fromwebpages which are categorized by the program to help people. People distinguishclassification data from web page with data complexity is serious problems.We have developed a super multi-class classification to solve this problem. Theapplication develops in VS2010development platform by C++language. We study theconcepts and principles of statistical learning theory. We learn the existing popular algorithmsto understand the basics of the classification. The existing popular algorithms contain Boost,Naive Bayes, k-nearest neighbor and neural network. We propose the super multi-classclassifier based on overall understanding of classification algorithm to improve thedecomposition method.First of all, we introduce the overall process and the principle of detailed exposition withthe architecture diagram and block diagram of the project. Section2of Chapter3states theprocess of design for the entire system. We carried out a detailed explanation which is dividedinto training learning and webpages classifying. We introduced our proposed the principle andimplementation of multi-classification methods in section3. The method sorts the categorieswith the binary tree for each branch so that reducing the amount of computation and increasethe classification accuracy.We achieve each module in accordance with technical proposal. The first module isgetting information from the webpage files. It extracts the title and keyword of the web page.We extract information including web content text, URL, notes, program code, label, layoutcode, and the access to the information, then we store the data into the XML file. Systempretreats page source word extraction by x2classification feature according to the informationin the XML. The process deal with feature weighted probability estimation and featureevaluation. Training learning classifier model is created by Parameter settings according to thecharacteristics of the sample files. Multi-class support vector machines based on binary treedivide into two sub-categories. Then subclass divides into two sub-sub-classes. Finally, we constitute a multi-class classification binary tree structure. We propose the multi-classproblem is decomposed into a number of binary classification problem, and in each of the twotypes of nodes classified training a support vector machine. The algorithm can effectivelysolve the problem.The final test of the super multi-class classification uses400samples and eight categories.The each category contains50samples. The final test results measured accuracy rate of anexisting classifier accuracy. We introduced a linear approach to reduce the dimension.Improved method is compared with the traditional method what prove our method has largeimprovement on the performance and the recognition rate.
Keywords/Search Tags:SVM, classifier, training, feature extraction
PDF Full Text Request
Related items