Font Size: a A A

Hierarchical Classification For Chinese Web Page Based On Improved SVM-KNN

Posted on:2011-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y K DengFull Text:PDF
GTID:2178360302474589Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the Internet has entered into a new age of Web 2.0, new Web media, such as Blog and SNS, are becoming increasingly popular, accompanied by the explosive growth in the amount of information on the Web. Facing with large amounts of data on the Web, It is increasingly hard for people to obtain needed information effectively from the Web.In this paper, Hierarchical classification of Chinese web pages was studied. Firstly, we go into details on the relevant technical background, including the basic theory of SVM algorithm and hierarchical classification of the text. And then we select the SVM-KNN algorithm to analysis, which is the basic unit of the hierarchical classification model proposed in the paper. Then we make a little improvement in the case of imbalanced data set. On this basis, we propose a hierarchical model of text classification, then this model is applied to the Chinese web page classification, and we design and implement a prototype system.Using CCT2002 corpus, three classification methods, which are hierarchical classification model proposed in the paper, hierarchical classification based on SVM algorithm and the flat SVM algorithm, are compared with experiments. Experimental results show that the hierarchical classification model of the text, which is based on improved SVM-KNN, not only maintains a certain degree of classification accuracy, but also reduce the time consumed in the process of classification effectively. Therefore, we believe that the proposed method in the paper is suitable to be applied to the automatic classification of Chinese web pages.
Keywords/Search Tags:Automatic Chinese Web Page, Hierarchical Classification, Support Vector Machine, Imbalanced Data Set
PDF Full Text Request
Related items