Font Size: a A A

Research On Automatic Text Summarization Based On User Comments

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:M YuanFull Text:PDF
GTID:2518306350489844Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and e-commerce,more and more people are booking hotels through online platforms,thus a large number of reviews have also been generated.How to effectively discover valuable information from massive data has become a challenge.Researchers have proposed many technologies to solve these problems,and automatic text summarization is one of them.This article aims to use automatic text summarization technology to mine the crucial information in reviews,solve the problem of hotel review information overload,provide consumers with references,and put forward suggestions for hotel managers.At the same time,the reviews will contain fake reviews,therefore the fake reviews must be identified before the automatic summary of the reviews,and the interference of the fake reviews must be excluded.At present,the relevant public data sets are mainly in English,which increases the difficulty of domestic real-world applications.Consequently,the research selects real-world Chinese hotel review data to obtain the data set.The research selects reviews of 8 Piao HOME hotel chains on Ctrip.com as the research object.The work includes two parts: fake review identification and automatic text summarization.The identification of fake reviews is the preliminary work to ensure the authenticity of the summary.Fake reviews are identified using the classification method of supervised learning.The tasks include: First of all,obtaining,cleaning,and organizing data,and deleting irrelevant reviews.Then the identification features are selected as the basis of artificial labeling,and the artificial labeling of false comments is used to identify the data set.Then use the Bert language model is used to represent the text,and the Bert model is used to train,verify and test the classification model,evaluate the model,and predict the unlabeled data.At last,keep the true reviews and eliminate the fake ones.The task of automatic text summarization includes:Firstly,in view of the short and irregular syntax of Chinese reviews,the real reviews are expressed in fine-grained clauses and text.Then,K-Means clustering algorithm is used to extract summary sentences,and the summary of comments is finally formed.Finally,in order to judge the summary and supplement the review summary,the Text Rank algorithm is used to extract keywords.Before keyword extraction,the review is segmented,the stop words are removed,and the part-of-speech are marked on the basis of the original data processing jobs.Synthesizing the results of keywords and review abstracts,the study draws the following conclusions: Firstly,irrelevant review occupies a considerable proportion;Secondly,the fake review identification features can ensure the consistency and objectivity of the manual annotation of the data set.Thirdly,the evaluation index of the fake review recognition model is at high levels,indicating that the method is effective for the fake review recognition in this research,so the model can be applied to the fake review detection of newly added reviews;fourthly,the keywords and review summary are basically the same in content,verifying the summary;Finally,the advantages and problems of each chain hotel are found,and they have high similarities.In the end,several suggestions were put forward in response to the problems in the hotel.The research has achieved good results and can solve the problem of information overload.The whole process is also applicable to the automatic summarization of reviews of other hotels.It is an application practice of the Chinese multi-document short text summarization method for specific cases.
Keywords/Search Tags:hotel review, deceptive review, text summarization, BERT, K-Means
PDF Full Text Request
Related items