Text Categorization On Machine Learning Algorithm

Posted on:2006-09-07

Degree:Master

Type:Thesis

Country:China

Candidate:X B Jin

Full Text:PDF

GTID:2168360152982514

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

First the paper introduces text categorization in the application of machine learn -ing, pattern discrimination and data mining and explores the connection between text categorization and them. Next, we present the procedure of text categorization including character selection and text classification algorithm. At last, we put forward Îµ -KLD, Bayes with Lee Model, TFIDF with Lee Model.Îµ - KLD simplify the counting of vector and decrease the number of the parameters and restrictions. It is a new effective one well suited for text categorization task which even works on the condition: high number of documents and high dimensional space outperforming. The results show that Îµ - KLD counts the vector of the class and the document more simply and achieves a corresponding precision comparing with KLD. On the whole, the performance of Îµ- KLD method is preferable to KLD method.According to Lee's model and Bayes probability, we redefine the influence of the word and eliminate the skewness. Then, we compare two different vector representation-Influence and TFIDF which sway the classification precision and analyze two factors which effect the algorithm differently in the model. In the end, experiments show that heuristic method and Influence representation can improve Naive Bayes greatly at much lower time cost.

Keywords/Search Tags:

Machine Learning, Pattern Discrimination, Text Categorization, TFIDF, Kullback-Leibler Distance, Îµ-KLD, Bayes, Lee Model, Influence

PDF Full Text Request

Related items

1	A Study On Text Categorization Based On Machine Learning
2	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
3	Text Categorization Research Based On TAN Model
4	Chinese WEB Document Automatic Categorization
5	High Resolution Remote Sensing Image Segmentation Method
6	Tfidf-based Text Classification Algorithm Research
7	An Automatic Chinese Text Categorization System Based On Statistical Language Model
8	Correlation Algorithm Research And Realization Chinese Text SVM-based Classification
9	Application For Web Text Categorization Based On Support Vector Machine
10	Text Categorization Algorithm Based On Machine Learning