Font Size: a A A

Short Text Classification Based On Apriori Algorithm

Posted on:2016-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WangFull Text:PDF
GTID:2208330470955310Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As we all known, as the most important carrier of information, text of large amount and various types are being presented on the Internet nowadays. In recent years, the words number which users write on some modules of forum are limited to a certain range. Along with the rise of new applications such as Weibo and Weixin, the amount of these texts with limited words, which are called short texts, are growing explosively. Short text, which has some unique features, is different from common text. Their length are short and features sparse, which makes its processing technology different from traditional natural language processing technology. Classification algorithms in existence, which can hardly fulfill the accuracy and efficiency of real projects, are not suitable to process short text.The Apriori algorithm is the most classic one on mining association rules. It mines relationships among transactions by finding out all frequent item sets. However, it is not wise and proper to settle short text classification problems using Apriori algorithm directly. Feature extension is a kind of technology that can extend feature vector based on the original text, a well-made feature extension method can solve problems of lacking features. The Vector Space Model can simplify the process of text into vector computation, which makes it easy to measure the similarity among texts. The extensional knowledge database of short text is real-time, which means incremental learning of extensional knowledge database is very necessary.Based upon the background above, we take the post title as the short text and the post body as the long text. The main contribution of this paper can be summarized as follows:We first mine frequent item sets from the main body with the help of the modified Apriori algorithm; then we extract the extensional knowledge database from frequent item sets so as to solve the expression weakness of short text, and it is also an extension to the current method of classifying short text.Then, based on the Vector Space Model, we propose an incremental learning algorithm as well as its analysis to update feature extension database,which can maintain its instantaneity and generality.Finally, based on the approach of this paper proposed,we perform experiments to verify the precision,recall,F-measure and efficiency of our algorithm.The work of this paper achieves the goal of extending the message of short text. Besides,with the incremental maintenance of feature extension database, we can enhance the precision of short text classification while its efficiency changes little.
Keywords/Search Tags:Short text classification, Association rule, Vector Space Model(VSM), Incremental learning, Long Text Database
PDF Full Text Request
Related items