Research And Application Of Web Text Classification

Posted on:2007-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Ke

Full Text:PDF

GTID:2178360182980901

Subject:Computer application technology

Abstract/Summary:

Along with the rapid development of Internet , there are abundant ,isomeric, semi-structured and dynamic information resources on Web. Among these Web information , above 80 percent exist in the form of Web text. How to seek and gain the valuable information and knowledge model from these vast Web information resources, have already become the question urgently awaited to be solved in the information processing domain . The questions mentioned above can be resolved effectively by Web text classification , which origins from ATC (Automatic Text Classification), and is the key constituent of Web text mining . It can classify search results, which not only enhances the efficiency of search for Web users, but also improves the ability of localization to goal knowledge, and extracts the valuable knowledge.On basis of analyzing the present research situation and existing question of Web mining and Web text mining , this thesis mainly studies the essential technologies of Web text classification, the common text classification methods and the mixed method of Web text classification based on Rough set and KNN .The main research works are shown as follows .(1)Introduce the basic theory and the relevant knowledge of Web mining and Web text mining , and analyze the research background, the present situation and the existing questions of Web text mining and Web text classification.(2)Analyze the essential technologies detailedly in the process of Web text classification, such as preprocess, participle technology, text expression, weight computation, feature selection and extraction , dimension descending technology . five influence factors for evaluating classification performance and several commonly appraisal methods of classification methods are discussed.(3)Discuss several general text classification methods: KNN, vector distance method based on VSM, Bayes classification , support vector machine classification, decision tree and so on, analyze and compare the advantages and disadvantages of these classification methods.(4)Propose one kind of mixed classification model of Web text based on rough set and KNN. Using the theory of attributes reduction of rough set, dimension of vector can be reduced in the process of text classification, and use one kind of simplified algorithm for attributes reduction based on distinct matrix. In the process of feature selection, the method of mutual information is used. A series of experiments have been done , and the results show that such mixed algorithm is feasible compared with traditional KNN method .

Keywords/Search Tags:

Web Text Mining, Web Text Classification, Rough Set, K Nearest Neighbor, Attibutes Reduction

Related items

1	Study Of Web Text Mining Based On Rough Set Theory
2	Research On Integrated Classification Algorithm Based On Rough Set Attribute Reduction
3	The Analysis On The Basic Techniques For Preprocess Of Text Mining And The Study On The Application Of Text Mining
4	Analysis Of Text Information Based On Deep Learning
5	Mining Research, Based On The Integration Algorithm Of The K-nearest Neighbor Classification
6	Research On Text Emotion Classification Based On Rough Set
7	Based On Rough Set Text Automatic Classification Study
8	Research Of Text Mining Based On Rough Set Theory
9	Application Of Natural Neighbor In Text Classification
10	Text Emotional Classification Based On Text Mining