Font Size: a A A

Design And Implementation Of The Chinese Webpage Classifier Based On The Maximum Entropy Model

Posted on:2011-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2178330332966008Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays the information is growing in the exponential level with the rapid development of the internet. And the most effective method of dealing with this complex web information becomes a worthwhile subject.After analyzing the approach of Webpage pre-processing and the algorithm of webpage categorization, this paper designs and implements a Chinese webpage classifier based on the Maximum Entropy model.Firstly, the background of the Chinese webpage categorization is presented as well as the common categorization algorithms are analyzed.Secondly, according to the semi-structure characteristic of the webpage, this paper extracts the structured and text features from the webpage and then represents them as feature vectors. It also experiments on some different feature set to mining the appropriate feature set.Lastly, Maximum Entropy Model is applied to the webpage categorization. And the basic framework of the Chinese webpage classifier based on the ME model is presented. By compared with other webpage categorization approaches, the experimental results show our approach achieve to higher recall, precision, and F1.
Keywords/Search Tags:webpage, classifier, Maximum Entropy model, feature
PDF Full Text Request
Related items