Algorithm Of Text Classification And Its Application

Posted on:2005-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Peng

Full Text:PDF

GTID:2168360125958750

Subject:Computer application technology

Abstract/Summary:

With the development of the Internet techniques, the information on the Internet increases exponentially. One important research focuses on how to deal with these great capacities of online documents. Information classification is one of the crucial parts of information processing. It is a task to classify the information extracted from the Internet into categories, for the convenience of retrieval. This thesis mainly studies some related algorithms on text classification and hypertext classification.This thesis firstly introduces general development and some techniques of information calssification. Then, some analyses and remarks are made to compare the performance of some typical classification algorithms. Thereof, basic theory support of text classifcication and hypertext classification is provided.In the research of text classification, this thesis emphasizes on improvement of half-surpervised classification algorithms. Considering the contradiction of deadly need for large labeled train-set to obtain high classification accuracy and the scarcity of labeled documents, this thesis makes study in two aspects. On one side, to enlarge the train-set, an EM_SVM classification algorithm is proposed, based on the analysis of traditional SVM algorithm and EM_NB algorithm. Experimental results show that, with the same scale of labeled documents, EM_SVM algorithm, which involves unlabeled documents in training process, performs better than SVM algorithm. And EM_SVM acquires higher classificatoin accuracy than EM_NB algorithm on small data set. On the other side, to improve the training method of classifier, this thesis presents a new cooperative training classification algorithm, which cooperates TFIDF and NB classifers to combine labeled and unlabeled documents. The experimental results show that the new algorithm has higher classification accuracy and lower average error than those comparable algorithms.In the research of hypertext classification, this thesis concentrates on the cooperation and synthetize of rules of hypertext. To solve the problem of variety of hypertext and unsteady performance of using single rule of hypertext, after analyzing different rules of using hypertext, this thesis presents a new hypertext classification algorithm based on co-weighting multi-information. Experimental results show that the new algorithm performs better than using single hypertextinformation individually.

Keywords/Search Tags:

Information classification, Text classification, Hypertext classification, Cooperative training, Co-weighting hypertext information

Related items

1	Web Mining Research And Implementation Of Super Text Classification
2	Research On The Ensemble Classification Algorithm Of Web Text
3	Markov Logic Networks With Its Application In Hypertext Classification And Link Prediction
4	An intelligent hypertext system
5	Decision support for information indexing and retrieval: Implications for hypertext systems
6	Gender Classification Based On Micro-blog Text And Social Information
7	Research On Text Classification And Its Related Technologies
8	The Research And Implementation Of Text Classification Based On Meta-information And Optimization
9	The Research And Implementation Of Text Classification Based On Meta-Information And Optimization
10	Research Of Chinese Text Classification Algorithms Based On VSM