Font Size: a A A

Chinese Keyword Extraction And Analysis Based On Tourism Weibo

Posted on:2019-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:2428330593451568Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of natural language processing,keyword extraction and sentiment analysis have been hot field.The researches of keyword extraction and sentiment analysis are helpful to grasp the themes of texts and provide the corresponding decision-making.Therefore,the related researches have very important theoretical significances and social values.As a Chinese social platform,Sina Weibo has an important impact on people's daily life.Tourism weibo published on Sina Weibo has the characteristics of concision,outstanding themes,immediateness,interactivity and dynamism.Tourism weibo is as the research object and the algorithms of keyword extraction and sentiment analysis are studied.The main contents are as follows:(1)Regarding the problems that the traditional TF-IDF rely heavily on high frequency words and cannot fully consider the multi-category words,the scores of part-of-speech are introduced and traditional TF-IDF formula is improved.Regarding the fact that classic Textrank cannot solve the problems of equal probability jumping and the lack of word meanings,deep learning's method has been adopted to train texts' language models.The handling of weibo contents is transformed into vectors' calculations in the vector space.Through calculating the similarities between vectors,the semantic word similarities can be obtained.And then the scores of semantic word similarities in the sliding window sizes and word frequency are added to Textrank iterative formulas.Changing the number of keyword extraction and the size of sliding window,optimal performance of Textrank algorithm is obtained.(2)Regarding the problems that the constructed sentiment dictionary cannot update the Internet new words in real time,the mutual information between positive sentiment words and unidentified words and the mutual information between negative sentiment words and unidentified words are introduced,and the sentiment of unidentified words is identified according to the difference between positive and negative mutual information.To avoid colleting many sentimental dictionaries,SVM which has good performances is introduced to do sentimental classifications of Tourism weibo.Because the traditional features of word frequency cannot fully reflect the semantics of weibo,Word2 vec is introduced to train the language models of texts.And then Word2 vec and TF-IDF are combined to improve word vectors.The average values of all improved word vectors in weibo sentences are used as the input of the classifier for sentiment analysis.Optimal performances of the classifier is obtained by adjusting the kernel function and penalty factor of SVM.The experimental results show that the algorithm of edge weight optimizing Textrank can solve such problems as equal probability jumping and the lack of semantics,which is helpful to extract keywords with low frequency but highlighting weibo topics,and obtain better keyword extraction effects.The algorithm based on improved feature fusions that optimize SVM can solve the problems that the features of word frequency cannot express texts' semantics well and improve the classification performances effectively.
Keywords/Search Tags:Tourism Weibo, Keyword Extraction, Sentiment Analysis, Word Vectors, Edge Weight Optimizing, Feature Fusions
PDF Full Text Request
Related items