Web Texts Classifier Based On FOIL

Posted on:2006-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Wang

Full Text:PDF

GTID:2168360152466622

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Along with the development of Internet, network information increases rapidly. Inorder to make the information service more efficiently and truly, we should get theinformation in Internet organized and classified reasonably. The thesis focuses ontexts information processing in the network, precedes the thorough research to textsclassification from theory and application. First, the thesis describes a model of automatic texts classification system, whichincludes five aspects: information pretreatment, features denotation, featuresextraction, making use of text mining technique to extract classifier model (involverules extracting and classification) and evaluating model quantity. Second, the thesisintroduces the theory and the key techniques about information pretreatment, featurerepresentation, feature extracting, rules extracting and texts classification, especiallythe classifier base on FOIL (first order inductive learning). In the end, we constructthe classifier for Chinese texts and realize it with Delphi 6.0. The characteristic of the thesis is classification guided by first order rules.Different from traditional classifiers, this classifier makes use of the information oftexts to extract all the rules attributed to every class. Then classify texts using theserules. For the dataset, the classifier makes the following improvement: (1) Using thehalf structure of these Web texts, estimates how its title can help classifying, thengives the title suitable significance while extracting characters. (2) Deal with theresults of characters extracting once more: this time delete those useless word-set thatdefined by system. (3) Because the rules make by FOIL are all for matching positiveexamples, this classifier is designed for specifically classify. For those texts thatmaybe belong to one more classes, make it vest in one only class. It is depended onthe rule with has heaviest weight and can match this text. In addition, add a defaultrule to give out a special class for some texts that does not belong to an unambiguousclass. (4) Filtrate those rules whose weight are lower than ambit. This ensures all theliterals appearing in the predication of rules have definite importance. This thesis analyses the excellence of classifiers based on first order rules. It alsocarries through investigation about the algorithm designed for classify and test its 3ç¦å·žå¤§å¦ç¡•å£«å¦ä½è®ºæ–‡performance through instance. In the end, this thesis introduces the construct andimplement of Web texts classifier based on FOIL. It also gives out the parametersevaluating the performance of the classifier (precision, recall, F1).

Keywords/Search Tags:

Web texts mining, Texts classification, Classifier, First order rules

PDF Full Text Request

Related items

1	The Application Of RS Theory In Categorization Algorithms Of Texts Mining
2	Research Of Topic Model-based Approaches For Sentiment And Topic Modeling On Texts
3	Research On Short Texts Classification Methods Based On Features Fusion And BiLSTM
4	Research And Implementation On The Identification Of Authorship For Chinese Texts
5	Research On Fine-grained Opinion Mining Technologies Of Web Review Texts
6	Coreference Resolution In Biomedical Texts
7	DWG File Management Software With2-D And3-D Texts Editing Functions
8	Research On Multi-aspect Opinion Mining From Review Texts
9	Research On Chinese Texts Clustering
10	News Web Texts Classification Based On Contents