Font Size: a A A

Text Categorization On Machine Learning Algorithm

Posted on:2006-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X B JinFull Text:PDF
GTID:2168360152982514Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
First the paper introduces text categorization in the application of machine learn -ing, pattern discrimination and data mining and explores the connection between text categorization and them. Next, we present the procedure of text categorization including character selection and text classification algorithm. At last, we put forward ε -KLD, Bayes with Lee Model, TFIDF with Lee Model.ε - KLD simplify the counting of vector and decrease the number of the parameters and restrictions. It is a new effective one well suited for text categorization task which even works on the condition: high number of documents and high dimensional space outperforming. The results show that ε - KLD counts the vector of the class and the document more simply and achieves a corresponding precision comparing with KLD. On the whole, the performance of ε- KLD method is preferable to KLD method.According to Lee's model and Bayes probability, we redefine the influence of the word and eliminate the skewness. Then, we compare two different vector representation-Influence and TFIDF which sway the classification precision and analyze two factors which effect the algorithm differently in the model. In the end, experiments show that heuristic method and Influence representation can improve Naive Bayes greatly at much lower time cost.
Keywords/Search Tags:Machine Learning, Pattern Discrimination, Text Categorization, TFIDF, Kullback-Leibler Distance, ε-KLD, Bayes, Lee Model, Influence
PDF Full Text Request
Related items