Font Size: a A A

Analysis And Application For Web Text Classification Based On Support Vector Machine

Posted on:2009-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:C Q ZhangFull Text:PDF
GTID:2178360272955774Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the information from it increases exponentially. Huge contents and many sorts make it the largest information resources in the world. It is required to explore ways to develop efficient network applications that are enabled to classify large amount of Web pages so that we can obtain information quickly and accurately.Support Vector Machine (SVM) is a promising machine learning arithmetic, which was brought by Vaplink. It's a new tool for machine learning by using optimization method. Since Support Vector Machine adopts structural risk minimization principle, the risk is only influenced by the number of input samples without input dimension interfering. Therefore, it not only avoids "dimension disaster" and enables good generalization ability but also gains more attention by researchers worldwide.This thesis expounds the basic concept of Web data mining, gives a common picture of web text classifying process including web text pre-processing, feature selection etc. It focuses on the research of BT-SVM based algorithm. Finally, after utilizing a method of pre-selecting fuzzy class membership of each sample, we design and implement a BT-SVM based prototype system for web text classification. Results indicate this method is relatively simple and can improve the speed of SVM without significant loss of classification performance. Furthermore, thanks to binary tree based structure, system is running more efficiently than traditional classifying ways.
Keywords/Search Tags:Feature Selection, Binary Tree based Multi-class SVM, Text Classification, Set Distance
PDF Full Text Request
Related items