Font Size: a A A

Web Texts Classifier Based On FOIL

Posted on:2006-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:2168360152466622Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of Internet, network information increases rapidly. Inorder to make the information service more efficiently and truly, we should get theinformation in Internet organized and classified reasonably. The thesis focuses ontexts information processing in the network, precedes the thorough research to textsclassification from theory and application. First, the thesis describes a model of automatic texts classification system, whichincludes five aspects: information pretreatment, features denotation, featuresextraction, making use of text mining technique to extract classifier model (involverules extracting and classification) and evaluating model quantity. Second, the thesisintroduces the theory and the key techniques about information pretreatment, featurerepresentation, feature extracting, rules extracting and texts classification, especiallythe classifier base on FOIL (first order inductive learning). In the end, we constructthe classifier for Chinese texts and realize it with Delphi 6.0. The characteristic of the thesis is classification guided by first order rules.Different from traditional classifiers, this classifier makes use of the information oftexts to extract all the rules attributed to every class. Then classify texts using theserules. For the dataset, the classifier makes the following improvement: (1) Using thehalf structure of these Web texts, estimates how its title can help classifying, thengives the title suitable significance while extracting characters. (2) Deal with theresults of characters extracting once more: this time delete those useless word-set thatdefined by system. (3) Because the rules make by FOIL are all for matching positiveexamples, this classifier is designed for specifically classify. For those texts thatmaybe belong to one more classes, make it vest in one only class. It is depended onthe rule with has heaviest weight and can match this text. In addition, add a defaultrule to give out a special class for some texts that does not belong to an unambiguousclass. (4) Filtrate those rules whose weight are lower than ambit. This ensures all theliterals appearing in the predication of rules have definite importance. This thesis analyses the excellence of classifiers based on first order rules. It alsocarries through investigation about the algorithm designed for classify and test its 3福州大学硕士学位论文performance through instance. In the end, this thesis introduces the construct andimplement of Web texts classifier based on FOIL. It also gives out the parametersevaluating the performance of the classifier (precision, recall, F1).
Keywords/Search Tags:Web texts mining, Texts classification, Classifier, First order rules
PDF Full Text Request
Related items