Font Size: a A A

English Short Text Measurement Method Based On Part Of Speech And Keyword

Posted on:2019-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhaoFull Text:PDF
GTID:2428330548963456Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and the intelligentization of mobile terminals,social media has also seen rapid development.Nowadays,a large number of users use social media every day,and the amount of information transmitted in social media also grows rapidly.Get these information spread in the social network,grasp the law of its dissemination and development,have important research value for hotspot excavation,commercial marketing and public opinion control.For mining data,the key point is to get the similarity between documents.How to solve the text similarity has attracted more and more researchers' attention.The early text similarity mainly focused on long texts.In recent years,due to restrictions on characters on social media,people prefer to use short texts to express their opinions when using social media.At this time,short texts are similar.The degree of measurement becomes even more important.However,short texts contain much less information than long texts.Therefore,the traditional method for measuring similarity of long texts is not very effective in measuring short text similarity.Therefore,how to measure short text similarity is particularly important.Therefore,this article proposes a short text measurement method based on part of speech and keywords and applies it to the prediction of prevalence.The main tasks are as follows:1.Improve the Word Mover's Distance(WMD)algorithm for short text measurement: The WMD algorithm first uses word2 vec to represent the words in the text on the vector space,and then calculates the similarity between each word to calculate the two short texts.The distance,WMD algorithm has achieved good results in a variety of data sets.However,this method gives equal weight to all words in a sentence,without considering the differences in different parts of speech and the importance of keywords.Therefore,this article considers the importance of part of speech and keyword,assigns different words to different weights when computing text similarity,and puts forward a calculation weight optimization algorithm.Experiments based on the classification of Weibo emotion tendency indicate that the improved WMD algorithm in this paper can achieve better performance.2.Apply the improved WMD algorithm to Weibo popularity prediction: This paper uses the improved algorithm and the original WMD algorithm method to extract the similarity features,and adopts SVM and logistic regression models to predict the popularity of Weibo..Through comparative experiments,it is found that using the improved WMD algorithm can obtain higher accuracy in the prevalence of Weibo.
Keywords/Search Tags:WMD, popularity prediction, logistic regression, text similarity
PDF Full Text Request
Related items