Font Size: a A A

Short Text Classification Research

Posted on:2009-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:J ChangFull Text:PDF
GTID:2208360272989032Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the increasing progress of means to transmit information, various kinds of short text data has been continually come out, such as the article abstract, email, instant message on Internet and so on. In order to apply them better, there have been a number of statistical classification and machine learning methods which have been applied into the text classification, and have achieved very good performance, these methods includes: vector space model, K-Nearest Neighbor (K-NN), decision tree model , Naive Bayes (NB), Support Vector Machine (SVM) and neural networks. And they have high automation, stable performance and strong adaptability, and compared with manual classification, they have more efficient. In this paper, on the basis of the current results, we do an in-depth research on the text classification technology and focus on put forward an effective text classification algorithm for the short text. The main work and features are:1. Do a simple review and sum up of the current research on text classification technology at home and abroad, and describe and analyze the related technologies.2. Focus on research and analysis on the common classification algorithms, do experiments and compare their performance, and finally show that the SVM has the best performance in short text data3. Do detailed research and analysis on SVM, in order to carry out multi-value classification better, we put forward a multi-value SVM classification algorithm, which is based on category level structure, and also we give some experimental data to prove its good performance in search engine.4. Introduce the association rules into text classification to overcome the shortcomings of the vector space model, and put forward a short text classification algorithm based on term association—CRTA (Categorization by Rules of Term Association) ,as well as prove its good performance and high efficient through experiments.
Keywords/Search Tags:text classification, Support Vector Machine, association rules
PDF Full Text Request
Related items