Font Size: a A A

Na(?)ve Bayesian-based Automatic Webpage Classification Technology Research

Posted on:2009-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:J S LiFull Text:PDF
GTID:2178360245974743Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Text and Webpage classification is an important technology based on text mining and Web mining, and one of the focuses of development in data mining research. By the high speed in development of data analysis tools,new database technology and internet technology, a large number of different forms of the complex types of data continue to emerge like: Semi-structured and structured data, hypertext and multimedia data, a very important problem in data mining area is data mining of complex data types; this includes complex objects, spatial data, multimedia data, time-series data, text data and Web data. Our research is try to find a way to build a model of Text and Webpage classification which based on a certain classification algorithm, and how to use the information of text content, URL link, and user usage, combined them to reflect the categories of Web pages. At last we also try to build a filtration system of Web pages.This paper describes a method for Chinese Webpage classification that uses user usage information and hierarchy from website, rather than the content-based analysis approach and the link-based analysis approach; we have to find a way to use other information like user's usage and hierarchy from the website to try to improve the performance and features of classifier. This paper tests this method and gains a result to analysis.In addition, expansion of the research, analysis a Web classification-based method of filtering technology research, and explore the way how to make use of user information to improve the accuracy of the filter approach.
Keywords/Search Tags:Data Mining, Web Classification, Na(?)ve Bayesian, Filtration
PDF Full Text Request
Related items