The Research On Text Classification Based On Clique Model

Posted on:2009-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:X H Hu

Full Text:PDF

GTID:2178360272480743

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the online electronic documents, the automated text categorization (or text classification, TC) becomes more important in the applications of information retrieval (IR), information filter and content management in the last decade, and has became forward research area of IR and machine learning (ML). As one of the most effective text information management methods, automated Text Categorization (TC) helps people organizing and managing the electronic text more quickly and easily. Text categorization is the procedure of automatically assign predefined categories to free text documents, and the TC method based-learning has became mainstream technology.At present, researchers have put forward a lot of mature text classification algorithm, most of them are come from the pattern classification, existing text classification algorithms such as: KNN and SVM, most of which are based on vector space model, without considering the semantic feature of these documents. Starting from the inadequacy of the traditional classification, the author of this thesis attempts to do some research on text classification and its related technologies. Several methods and techniques are presented.The main contributions of this paper are as follows:1) A clique-based text classification method is put forward, which, through constructing a similar graph of the context by a similar matrix of the context based on the train text, and then extracting the clique of the context (complete graph) from the similar graph of the context, we construct the classifier using clique information of each category, and combine with the SVM or KNN classifier. Experiments on 20NewsGroups corpus and Fudan University Corpus show that the method improved the classification performance.2) With the rapid growth of website information, especially on-line information increased, it is unrealistic to rely on human to deal with information. Therefore, the automatic classification has become a critical technology of great practical value, and it is a powerful tool to manage and organize data. In organizing effectively the extremely rich information resources from Internet, Web page automatic categorization has become an increasingly important area of study. Because of its own characteristics, the classification of WEB document has attracted attention from many scholars in recent years. Based on the traditional classifier, we make use of the rich link information. Experiments on the SEWM corpus show that the combination of the method proposed in this thesis with link information of WEB documents improve the classification performance.

Keywords/Search Tags:

Text classification, Text Clique, Graph model, Link, WebPages classification

PDF Full Text Request

Related items

1	Research On Text Classification And Its Related Technologies
2	Research On Graph Model-based Short Text Classification Algorithm
3	Short Text Classification Based On The Model Of Knowledge Graph And Word Combination
4	Research On Text Classification For Proposals And Construction Of Domain Knowledge Graph
5	Research On Short Text Classification Method Based On Text Graph Structure
6	Automatic Chinese Webpages Classification Based On Projection Pursuit
7	Gender Classification Based On Micro-blog Text And Social Information
8	Semi-supervised Text Classification Based On Graph Attention Neural Networks
9	Research And Implementation Of Text Classification Based On ERNIE And TextGCN
10	Text Classification Based On Natural Dimension Of Webpage