Font Size: a A A

Research And Implementation Of SVM-based Web Text Information Categorization

Posted on:2008-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2178360242470837Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The information resource on the web has covered various fields with the rapid development of internet, especially the prevalence of World Wide Web.To solve the problem of Information Overload, the techniques of web mining and web information retrieval have been greatly developed. An important method to deal with large-scale data is to classify them. It not only can set up corresponding web text database separately according to classification information,but also can improve recall and precision of the search engine.It also can set up automatic categorized information resource and offer the classified information catalogue to users.In this thesis, a survey on the related research and recent development of text categorization and related algorithms was given first. Based on the researching of different SVM algorithms for multi-class, the main work of the thesis is as below:Firstly, the text analyzes the total procedure of text categorization, including the information preprocessing, feature representation and feature catching. The author analyzes technologies of feature representation and catching and text categorization algorithm especially, and bring forward an integrative feature catching function.Secondly, the text studies the Statistical Learning Theory(STL) and Support Vector Machine(SVM)theory seriously, discusses multi-category classification algorithms of SVM. In order to make up for the deficiency of the traditional multi-class classification of SVM, a multi-class classification method of SVM network is put forward by combining the high-powered classification and error correcting capacity of error correcting output codes(ECOC) of SVM.Finally, based on basic research of multi-class categorization of SVM, The text applies the theory to practice and analyzes a web text categorization model based on SVM network. Based on the model, using the LIBSVM as a tool, the author confirms these multi-category classifications of SVM by the experiments and prove the multi-class classification method of SVM network has better usability.
Keywords/Search Tags:text categorization, SVM, web text, multi-class classification
PDF Full Text Request
Related items