Font Size: a A A

Novel Comment Information Mining Based On BERT And Extended LDA

Posted on:2023-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:S C SunFull Text:PDF
GTID:2569306770961759Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Since the 1980 s and 1990 s,online novels have developed rapidly,but their quality is also uneven.Practitioners in the film and television industry blindly acquire the IP copyrights of novels,but ignore the quality of the works,and only pursue the traffic and popularity of the IP,resulting in extremely poor adaptation effect,which in turn leads to IP bubbles,which not only affects the reputation of film and television companies,but also damages the upstream A critique of the novel.Therefore,it is crucial to identify whether a novel IP has adaptation value.From the perspective of readers,this paper uses machine learning algorithms to mine the value information of novels and popular topics that readers pay attention to from novel reviews,so as to evaluate the IP value of novels.Due to the different concentration of comment topics under different emotional inclinations,it is necessary to conduct topic detection on novel comment texts with different emotional inclinations.In the analysis of sentiment tendency,due to the lack of accuracy of the ratings in the review text,this paper proposes a method of comprehensively using the How Net sentiment dictionary and the BERT fine-tuning model to automatically correct the sentiment labels.The method achieved 84%accuracy on the test set.In topic detection,the Word2 Vec word vector and the texttopic matrix of the Latent Dirichlet Assignment Model(LDA)topic model are vector spliced,and the topic features of LDA and the context information of Word2 Vec are preserved,and then used as the input of the SKM algorithm.clustering.After comparing each group of experiments,it is proved that this method has better clustering performance than the single Word2 Vec word vector method and the single LDA topic model.Finally,this paper crawls the user comment text data of 83 novels on Douban Dushu.com,uses the above methods to conduct empirical research,divides novels into high-quality novels and general novels according to the derivative value of novel IP,and conducts topics under different emotional polarities.It is found that among the IP positive emotional comments with high derivative value,the topic concentration of stories and plots is the highest,accounting for 67%.Therefore,when purchasing the copyright of a novel IP,pay attention to whether the novel IP tells a good story;tell the story well when adapting a film and television.
Keywords/Search Tags:Affective analysis, topic detection, LDA, Bert, novel review text
PDF Full Text Request
Related items