Font Size: a A A

Research And Application Of Automatic Summary Technology Of Network Reviews

Posted on:2020-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B ZhangFull Text:PDF
GTID:1368330575457042Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the advent of the intelligent Internet era,people are able to express their opinions for various activities including shopping,tourism and so on by massively using mobile devices.These reviews are increased exponentially and sharply,which leads to the consequence that many websites are occupied by tens of millions of reviews and even more than we have expected.Since users must spend more time on selecting and screening the massive reviews confronted by them when browsing the websites,it is urgent to conduct effective abstract for massive reviews,which helps to alleviate pressure caused by reading.Much different from the general text abstract,the situation for abstracting reviews is complicated as users pay special attention to the aspect of subjects to be commented,as well as the information related to opinions and reviews accordingly,in this regard,the information is required to be extracted and mined particularly,then the abstract of reviews on such basis is expanded.Therefore,after analyzing the relevant work,the paper carries out research from two aspects-extracting of information related to comment and ing of information.The maj or content includes:A review extraction model based on multi-association bootstrapping is proposed.It defines and quantifies the correlation between three categories of aspect words and opinion words existing in the review sentences.On such basis,the semi-supervised bootstrapping algorithm is constructed accordingly.In terms of the algorithm,it firstly extracts a group of candidate aspect words and a group of candidate opinion words from the specified review corpus as the preliminary seed set,then according to defined three correlations,it extracts the words which are strongly correlated to the seed set cyclically.It is revealed by the experimental results that F-measure within the corpus set of reviews existing on mobile devices is 78.8%which is 9.6%higher than DP model after comparison and it shows that the algorithm is able to extract aspect words and the corresponding opinion words in a relatively more effective way,furthermore,it has lower requirements for the tagging of seed set with the marking cost being reduced finally.A SentenceTagLDA model based on LDA is proposed.It includes three components:respectively the topics,sentiment and distribution of the words used in model construction.Afterwards,HMM status transition is adopted to simulate the generation process of various attribute words and opinion words while in the generation process of topic words.It is shown by the experimental results of TripAdvisor data set that its accuracy rate is 1.3%higher than that of the benchmark model while the recall rate is enhanced by 28.07%.In this regard,it is manifested that the model has good performance in terms of accuracy rate,recall rate and other indexes,which is conductive to the distribution of topic words and sentiment related to reviews information.A abstract sentence extraction model based on hierarchical attention network is proposed.It is equipped with encoder-decoder structure.In the model,two layers of attention mechanism are introduced.The sentence encoder adopts attention mechanism to obtain the expression of vectorization for sentences by introducing the aspect words while the encoder of review files adopts attention mechanism to realize the relevance between the previous and following sentences.When decoding,it firstly marks the sentences so as to determine whether they should be selected as the candidate abstract ones or not by relying on the sentence output unit comprised by LSTM network and adopts greedy algorithm to remove the redundancy of the marked results.Afterwards,they are sorted out in order according to the importance of sentences for obtaining the final abstracts and the experimental results based on ROUGE method show that the value of ROUGE-2 obtained in TripAdvisor data set by the model is 7.95%,higher than the benchmark model for reference.The two layers of attention mechanism are added by experiments and the effectiveness of each layer is also verified.When the attention mechanism is added,the value of ROUGE-2 is higher than the non-attention mechanism by 6.79%;when the attention mechanism of files is added,the value of ROUGE-2 is enhanced by 5.91%.The experimental results show that the attention mechanism has better effectiveness when it comes to extracting the abstract sentences.Moreover,visualized color marking method is used to verify the positive effect of ordering on abstract effectiveness.An automated summary prototype system for online review is designed and implemented.The system is able to show the results of in a visualized way after comprehensively making use of the above-mentioned key algorithms related to aspect extract,topic computing and selection of abstract sentences.
Keywords/Search Tags:topic model, review analysis, LDA model, hierarchical attention, auto summarization
PDF Full Text Request
Related items