Font Size: a A A

Research On Parallel Algorithms For Hot Spot Discovery In Internet Reviews

Posted on:2021-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:H SuFull Text:PDF
GTID:2428330623474903Subject:Engineering
Abstract/Summary:PDF Full Text Request
Online reviews refer to the subjective description of a(class)entity or non-entity in an online platform.Nowadays,the hot spot of online reviews has become an important application in the field of artificial intelligence in the direction of text review value mining.Along with the exponential growth trend of the scale of online review data in recent years,its recipients often seem to be stretched when processing their description content at the cognitive level.The design of the review hotspot discovery algorithm has played a pivotal role in the entire research process.In addition to meeting the accuracy rate of traditional algorithms,it also needs to meet the requirements of higher scalability proposed by the new application background in the era of large-scale data.This article researches the hotspot discovery algorithm of online reviews.Its main task is to quickly and accurately find hotspot information that is valuable to review recipients from large-scale online reviews.The specific work is as follows:First of all,in view of the characteristics of current online comments with a lot of noise,miscellaneous content,and unstructured subjects,we tried to introduce the hot spot discovery ideas in Internet public opinion control into comment mining,and proposed an aspect-level comment mining algorithm based on clustering(DM-CK).The algorithm cleverly fuses local density,maximum and minimum distance algorithm,Canopy and Kmeans clustering algorithm,uses local density,maximum and minimum distance algorithm to filter network reviews,calculate and optimize threshold parameters for Canopy and Kmeans clustering algorithm.Finally,without relying on the number of artificially prescribed aspects,the text clustering method was used to automatically obtain different aspects of the optimal review data.Experiments show that the DM-CK algorithm can effectively find hotspot information in network review data.Secondly,in order to make the algorithm have the ability to process massive comment data,the original DM-CK algorithm was designed on the Hadoop platform for parallel algorithms,a parallel algorithm of network comment hotspot discovery(MDM-CK)based on MapReduce is proposed.This algorithm uses the HDFS distributed file storage system to distributed storage of massive review data.The MapReduce computing framework is used to parallelize the serial algorithm and then run the algorithm in a Hadoop cluster environment.In the end,the algorithm was executed on multiple task nodes at the same time,realizing the parallelization of the hot spot discovery algorithm for online reviews.The experiments show that the MDM-CK algorithm can not only realize the hotspots of network comment data,but also have the ability to process massive data.Finally,considering the characteristics of the MDM-CK algorithm that requires multiple iterations under the MapReduce calculation framework,the Spark platform is used to optimize and rewrite the MDM-CK algorithm,and a Spark-based parallel review hotspot discovery algorithm(SDM-CK)is proposed.This algorithm relies on Saprk's powerful high-speed memory computing framework for optimization,and uses the advantages of RDD to efficiently read and write in memory.Experiments show that the SDM-CK algorithm further improves the parallel efficiency while achieving hotspot discovery of network reviews.Synthesizing the above research content,we can extract hotspot information from massive network review data in parallel.By analyzing these hotspot information,we can effectively interfere with the decision-making body of review recipients and provide relevant guidance for review objects,so it has high research value.
Keywords/Search Tags:Hot spot discovery in reviews, Text clustering, MapReduce model, Spark framework, RDD elastic dataset
PDF Full Text Request
Related items