Font Size: a A A

Research On Filtratting Technology Of Chinese Information

Posted on:2007-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:S H LiFull Text:PDF
GTID:2178360185962624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the scale of Internet is increasing at a fastest speed. As we get useful information from Internet, we also meet more and more problems: information overload, information lost, and so on. To overcome these problems, the research of Web Information Filtering has drawn much attention. Chinese Text Filtering is a branch of Chinese Information Processing Research. It searches the useful information and eliminates the useless information in the dynamic data stream according to users' requirment.Web text extraction is the base of information filtering, we extracted the text content by analying the HTML source code that including the structure of the HTML syntax and control denotation. Then extract the keywords from this text to form a keyword dictionary.So the text is denoted by the keyword dictionary, that could make us dealing with the information quickly.Decision Tree method based on ID3 (Iterative Dichotomize 3)is widely used in information filtering. We implement the program of decision tree based on ID3 algorithm and this program can classify Chinese information effectively. Further more, this program can extract classify rules from data and these classify rules can be added, deleted or modified. Experiments result proves that decision tree classifier is a effective classify method.Classifying based on Bayes model is a research hotspot in data mining. This paper study Bayesian Classifying Model and implement two classifier: Native Bayesian Classifier and Attribute-associated Bayesian Classifier. Native Bayes classifier is a simple and effective classification method, but its attribute independence assumption makes it unable to...
Keywords/Search Tags:Information Filtering, Decision Trees, Native Bayes, Attribute-associated Bayes
PDF Full Text Request
Related items