Font Size: a A A

Web Mining Research And Implementation Of Super Text Classification

Posted on:2007-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2208360182478710Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of internet, web has become a great, dynamic and isomerous information and resource base. One result of this is the "Information Exposure", and the other result is the people's urgent need to quickly get valid knowledge and information from the web. Web mining is just the recently emerging research field to solve this problem.This paper firstly introduces the conception of web mining based on data mining theories and discusses the flow and categorization of web mining. Then it elaborates the text mining, the text and hypertext classification. Finally it focuses on the introduction of Naive Bayes text classification based on Lee model and hypertext classification based on rules.In the field of text classification, David Lee came up a model with a psychologically approach considering text classification. Sanban utilized the model to define the influence of the word, but there are a skewness lies in the training set. According to Lee's model and Bayes probability, the influence of the word is redefined and the skewness is eliminated. Two methods to read test documents are presented. In the end, experiments show that heuristic method can improve Naive Bayes greatly by much lower time cost.Compared with the plain text, hypertext is rich in information. On this basis, Yiming Yang brings forward five kinds of hypertext rules. Based on three of these rules , this paper presents four kinds of hypertext representation and applies them to hypertext classification. The contrast on precision and run time is also made between representation with the application of the Yiming Yang hypertext rules and the one without. Experiment results show that the former one gets higher precision with less time cost and preferable overall performance.
Keywords/Search Tags:Web Mining, Text Classification, Hypertext Classification, Lee Model, Naive Bayes, Influence, Hypertext Rules, Hypertext Representation
PDF Full Text Request
Related items