Font Size: a A A

The Technical Research Of Chinese Web Page Automatic Classification Based On SVM

Posted on:2012-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:K ShengFull Text:PDF
GTID:2218330338970974Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization and development of network, the network information resources, which is in explosive growth, the users choose their interest from it, so network brings convenient to users; however, abundant information brought new problems. How to effectively organize and manage these information, and rapidly, accurately, comprehensively to find the information which the users need is currently facing challenge."Information overload" is one of the main reasons influence the information collection efficiency, many online information related to the default is very easy to "information overload" phenomenon, although the use of "information filtering" and "information retrieval methods and so on can solve this problem. However, most of the "information filtering" and "information retrieval methods can not explicit explain what the user needs. Facing the mass information online, traditional approach is to classify the internet information artificially and organize, arrangement it. But, this kind of artificial classification approach is not only cost a lot of human, material and financial resources, and the existence of the classification results consistency is not high. Therefore, on page automatic classification technology research, make the web can automatic classification, to provide users with fast and convenient information, have important practical significance.This article gives a brief introduction to the background of Chinese web page of text categorization research, significance and research situation of SVM algorithm in home and abroad. And aim at the improvement of many categories classification methods of SVM, and then give validation to it. Specific works are as follows:Firstly, this paper introduces the process and application of WEB mining briefly, the research analyzed the overall process of Chinese WEB page categorization, including:Chinese WEB page pretreatment, feature selection technology, WEB page classification algorithm and WEB page classification effect evaluation index. The key research is to analysis the feature selection technology, web page classification effect evaluation index.Secondly, study the statistical learning theory of the main content and the basic principle of SVM algorithm, and introduces the category of the classification of SVM method. For support vector machine more, classification algorithm was improved, and puts forward a new method of the many classification of SVM network.Finally, making the simulation experiment on the basis of improving the many classification algorithm, and training, testing the Chinese web network samples, which are collected from the network,the experimental results show that the method of SVM classification more better. In addition, the classification result on the characteristics that influence has been performed derived that IF-IDF method is superior to the word frequency weighting method. It provides some ideas of research for how to choose appropriate characteristics to express the analysis methods and improve the classification accuracy.
Keywords/Search Tags:Web pages classification, SVM, Many classifications
PDF Full Text Request
Related items