Font Size: a A A

Chinese Web Pages Based On Naive Bayesian Classification Technology Research And Application

Posted on:2013-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:H C LiuFull Text:PDF
GTID:2248330374475858Subject:Computing applications technology
Abstract/Summary:PDF Full Text Request
Chinese web page classification is a technology with highly practical value,it has a wide range of applications,such as user behavior analysis、personalized recommendation service and precision marketing,but the current precision and recall rate of web page classification is not hight,there are morelarge room for improvement,it need to improve the key technologies to improve the performance of web page classification.In this paper, study in depth on the Chinese web page classification related technologies,and improve the three key processes including web page pretreatment、feature selection、classification algorithm,the specific content as follows:(1)On the basis of the analysis of the Chinese web page feature in the structure and content,this article proposed a text extraction method based on the text and link rate of DIV piece,design a web pretreatment process,and do the test to verify;(2) For the traditional chi-square statistic feature selection algorithm existence ignored to consider the influence of the feature word’s frequency and biased low document fregquency of feature words, this article proposed an improved chi-square statistic algorithm named ICHI, considering the the influence of the feature word’s frequency and introducing the penalty function;(3) For the classic tree augmented Naive Bayes algorithm, there are dependence of symmetry between the attributes、dependent relationship without direction and computational complexity in the model structure construction process, this article proposed an improved algorithm named ITAN, apply association rule mining thoughts to TAN model structure learning;(4) Combined the feature selection and classification algorithms to do a test, verify the superiority of the improved algorithm; The results show that the proposed improvement ideas is effective, improve the performance of the web page classification to a certain extent.Finally comprehensive the study work of this paper, apply Chinese web page classification technology to a telecom operator’s Internet user behavior analysis system design and implementation, achieve better results.
Keywords/Search Tags:classification, chi-square statistic, naive bayes, association rule, feature selection
PDF Full Text Request
Related items