Font Size: a A A

Research On Short Text Classification

Posted on:2017-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiuFull Text:PDF
GTID:2348330488965910Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,more and more short text messages such as Microblog,Wechat,short message and the others are beginning to find their ways in people's daily life.The information extraction for the short text also plays an increasingly important role in recommendation,public sentiment and the other aspects.The text classification is an effective method to extract the information of the text.However,due to some characteristics---the short length of short text,the sparse matrix of the short text,the traditional text classification algorithm cannot be well suitable for short text classification research,thus,the effective short text classification algorithm has become urgent.Short text classification is mainly involved in pre processing of short text,word segmentation,short text feature words extraction,short text similarity calculation,short text semantic development.This paper mainly conducts a research on two key technologies: short text classification algorithm and short text similarity calculation to improve the performance of short text classification according to those existing problems.First,in terms of short text classification,the addition of the semantic information in short text information extraction makes short text classification inefficient.Based on this problem,a kind of KNN short text classification based on the category feature words was put forward.According to the similarity value between the category feature words and the training set samples,the training set is split again.According to the semantic information of the test text based on the How Net,the training set needs to be reconstructed to reduce the number of the samples in the training set correspond ing the test text,and to improve the efficiency of KNN short text classification.Experimental results show that the average running time of test texts in accordance with the KNN short text classification algorithm based on the category feature words is reduced about 50%,compared with the KNN short text classification algorithm based the semantic message in the cases of the same number of test texts.Second,in terms of the short text similarity algorithm,the short text similarity algorithm based on the How Net is highly depend on the dictionary.It just calculates the similarity value between keywords and can't efficiently distinguish the important degree of the keywords in the short text.The short text similarity algorithm based on the category feature words is put forward.This algorithm puts the d ifferent weight coefficient to the different key words to improve the accuracy of short text similarity calculation.The key words are the category feature words,or the nouns or verbs of the non category feature words,or adjectives or adverbs of the non category feature words,or the other key words.The Experiment is on the basis of the KNN short text classification algorithm based on the category feature words.It shows that this short text similarity algorithm can effectively improve the classification accuracy of the KNN short text classification algorithm based on the category feature words.Meanwhile,the efficiency of the KNN short text classification algorithm based on the category feature words is also further upgraded.
Keywords/Search Tags:short text classification, category feature words, training set reconstructtion, short text similarity value, part of speech of the keyword
PDF Full Text Request
Related items