Font Size: a A A

The Research Of Automatic Chinese Web Page Categorization Based On Support Vector Machine

Posted on:2008-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:X X WangFull Text:PDF
GTID:2178360215969478Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development and popularity of World Wide Web, the number of online electronic information increases exponentially, and people have already transited into the ages in which information is extremely abundant and digitized from the age lacks information. Facing the vast number of online information, it's hard for us to acquire the real useful information quickly and effectively. Thus, how to handle and organize the vast number of online information, has become an important research subject gradually. Traditionally, web documents are classified manually. But it is time-consuming and labor-intensive. Due to this, the automatic text categorization has been put forward and studied to deal with the disorder phenomena of online information. Also, combined with the technologies of information retrieval, search engine and information filtering, it has become one of important tools to handle the problem of acquiring information on the Internet.The text first introduced Chinese Word Segmentation, Then analyzed in the web page to classify the process to have the contribution the structure ingredient, and aimed at the web page the characteristic and the web page mark function differently carries on the adjustment to the power value, fully used in the web page to link the information function, caused it even more to be suitable for the automatic categorization process. Has outlined the statistical study theory main content, inferred the support vector machine method to be possible to divide and the linearity in the text linearity cannot divide in the situation to realize the classified mathematical formula, has carried on the expansion in this foundation to a special vector machine. And adopts a method of multiclass text categorization based on an improved support vector machine with binary tree and the pre-extracting support vectors and circulated iterative algorithm.simultaneously proposed a new noise-tolerant support vector machine method KNN-SVM algorithm, and uses in Chinese homepage classified experiment, has obtained the very good effect.
Keywords/Search Tags:Automatic Chinese Web Page Categorization, Chinese Word Segmentation, Support Vector Machine, Web Page Link
PDF Full Text Request
Related items