Font Size: a A A

SMS User Interest Hierarchy Algorithm Based On Text Classification Algorithm

Posted on:2012-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2218330362453134Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The progressive development and maturation of computer network, internet and database technology provides an effective platform for some area of information sharing. The mass of information on the one hand makes the people's lives easier than ever, but it also make people have to face some questions: How to effectively extract the needed information from a wide variety of information and how to make the process of extract information more quickly and efficiently.To be able to quickly and efficiently extract the information the user needs, based on Text Classification, this thesis design a SMS user interest hierarchy algorithm which based on the Text Classification algorithm. User interest hierarchy algorithm is split the test Chinese SMS into some levels. Among them, at all levels represent the user's interest. Through splitting the SMS, users can quickly access the information they need; at the same time, we can improve the speed of information retrieval according to the user's interest level.Feature selection affect the classification accuracy largely, the dimension of feature space has a direct impact on the text processing. The purpose of feature selection is to reduce the redundancy of the feature space, and making the feature elected to reflect the text as the content. As for processing Chinese SMS and according to the needs of user interest hierarchy, this thesis use anti-document frequency of word frequency (TF-IDF) as a standard feature selection, and combine with some other filtering methods to reduce the SMS feature space redundancy, so as to improve the process of user interest hierarchical algorithm efficiently and the accuracy of user interest levels.In order to verify the accuracy of user interest level which processing by the user interest hierarchical algorithm and the feature space redundancy reduction ratio after feature selection, this thesis use SMS data as test data set and design experiment to verify and quantitatively analysis of the final results.
Keywords/Search Tags:Text Classification, Feature Selection, User interest Hierarchy, TF-IDF
PDF Full Text Request
Related items