Font Size: a A A

Research On Automatic Opinion Summarization Of Chinese Microblog

Posted on:2019-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330566984196Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the rising of web microblog social media platforms,obtaining brief key information from huge amount of microblog text has become one of the important tasks in social media text computing,and makes automatic text summarization become a hot problem in natural language processing again.This paper focuses on automatic extraction of Chinese microblog opinion summarization.Firstly,for the original microblog set,the sentiment dictionary is used to extract the microblogs including opinions,then the TF-IDF algorithm is applied to extract key words and topic words from microblogs,and the importance score is calculated according to this.In addition,microblogs are represented as vectors based on word embedding,and the similarity between them is calculated to judge the degree of redundancy between microblogs.Lastly,s two different methods of automatic opinion summarization of Chinese microblog are proposed.The first one,the method based on importance and redundancy,extracts microblogs from the original microblog set according to the rank of importance score one by one,and its average similarity between current opinion summarization set is calculated to judge if it should be added into the summarization set until the final set is generated.The second one,method based on semantic graph optimization algorithm,uses a complete undirected graph structure to model the semantic relationship in the original microblog set,and calculates the comprehensive weight of the nodes in the graph,then applies a graph optimization algorithm to remove nodes from the graph one by one,get its optimal subgraph,and finally filters out the opinion summarization set.The experiment results on the dataset of COAE2016 show that these two methods can effectively extract the microblog opinion summarization set.The best results of the method based on importance and redundancy reach 33.78%,16.33% and 9.52% in ROUGE-1,ROUGE-2 and ROUGE-SU4 respectively.The best results of the method based on semantic graph optimization algorithm reach 33.89%,14.63% and 13.71% in ROUGE-1,ROUGE-2 and ROUGE-SU4 respectively,which outperforms the best result in COAE2016.According to the results,the method based on semantic graph optimization algorithm outperforms the method based on importance and redundancy,but the latter is better in terms of time efficiency.
Keywords/Search Tags:microblogs, opinion summarization, TF-IDF, sentence similarity
PDF Full Text Request
Related items