Text Categorization On Machine Learning Algorithm | Posted on:2006-09-07 | Degree:Master | Type:Thesis | Country:China | Candidate:X B Jin | Full Text:PDF | GTID:2168360152982514 | Subject:Computer software and theory | Abstract/Summary: | PDF Full Text Request | First the paper introduces text categorization in the application of machine learn -ing, pattern discrimination and data mining and explores the connection between text categorization and them. Next, we present the procedure of text categorization including character selection and text classification algorithm. At last, we put forward ε -KLD, Bayes with Lee Model, TFIDF with Lee Model.ε - KLD simplify the counting of vector and decrease the number of the parameters and restrictions. It is a new effective one well suited for text categorization task which even works on the condition: high number of documents and high dimensional space outperforming. The results show that ε - KLD counts the vector of the class and the document more simply and achieves a corresponding precision comparing with KLD. On the whole, the performance of ε- KLD method is preferable to KLD method.According to Lee's model and Bayes probability, we redefine the influence of the word and eliminate the skewness. Then, we compare two different vector representation-Influence and TFIDF which sway the classification precision and analyze two factors which effect the algorithm differently in the model. In the end, experiments show that heuristic method and Influence representation can improve Naive Bayes greatly at much lower time cost.
| Keywords/Search Tags: | Machine Learning, Pattern Discrimination, Text Categorization, TFIDF, Kullback-Leibler Distance, ε-KLD, Bayes, Lee Model, Influence | PDF Full Text Request | Related items |
| |
|