Font Size: a A A

Research On The Key Technologies Of Opinion Summarization Facing The Micro-blog

Posted on:2016-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2308330482950897Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
As a derivant of the Internet development, micro-blog has captured a large number of users in a short period. At the every moment of every day, the public can comment on the topic of the interest. Based on the above situation, individual and group including business, government, are expected to be able to grasp the trend of public opinion from massive comments. Just so, opinion summarization for micro-blog text comes out. Also, as two key problems of the opinion summarization, the sentiment analysis and text summarization have become the focus of research. Among them, the sentiment analysis can extract the sentiment tendency of the text to help understand the preference, and text summarization can be used to do the task of information compression, generalization, to help know general information.Therefore, it is of great significance to do some research on sentiment analysis and text summarization for micro-blog. Around comments of a plurality of topics of micro-blog, this thesis studies two key technology of sentiment analysis and text summarization about opinion summarization. The main achievements are as follows:(1) In view of the micro-blog sentiment analysis, the thesis proposes an algorithm of sentiment analysis based on three-word-combination model for micro-blog. First of all, this method sums up and sorts out the current sentiment dictionary, and updates a part of resource, to get a more complete, more targeted dictionary. After the full study of micro-blog’s characteristics, it finds out that in most cases, the author of micro-blog will express their comments explicitly by the use of vocabulary, and the combination between the three word collocation can decide the sentiment tendency of sentences. So, this thesis puts forward sentiment analysis by using three word collocation, to automatically label corpus. Further, the thesis do the test for the automatic labeled corpus, and analyzes and studies multiple parameters which affected the results. The experimental results shows that, in case of no manual annotation, the automatically labeled training corpus is able to reach a maximum of 72.39% of the test correct rate.(2) In view of the micro-blog text summarization, the thesis proposes micro-blog text summarization method based on entropy combination. First of all, this method builds a model on the sample set making use of LDA(Latent Dirichlet Allocation) and mines the underlying theme. Then, it estimates the similarity between texts under every topic, to wipe out redundancy. In the calculation of micro-blog important degree, it finds out the truth that the entropy can measure information. Also, besides text information, micro-blog has other exogenous information, such as forwarding number, number of praise. Based on the above reasons, this thesis puts forward micro-blog importance calculating method which combined entropy and exogenous information of micro-blog. And then, the next step is to sort micro-blog by the important degree sequence. Finally, this method obtains final results which are extracted in a certain constringent proportion. The experimental results shows that, the obtained values of the method proposed by this thesis is higher than the average of contrastive one 7% from the point of the indicators, which proves the method is effective.
Keywords/Search Tags:Micro-blog, opinion Summarization, Sentiment Analysis, Text Summarization
PDF Full Text Request
Related items