Font Size: a A A

Microblog Social Network Public Opinion Analysis And Research Based On Big Data

Posted on:2017-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:H C GaoFull Text:PDF
GTID:2428330596957449Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of technology,network media has fully penetrated into all aspects of people's life.It has become an important platform for the release and communication of information.Microblog,as a new social networking media,due to its smaller,convenient communication and other advantages has become one of the important channel for people to understand public opinion.However,because the source of microblog varies greatly and the blind obedience of the people,some inappropriate public opinion guide may even produce certain harm to the society.Especially in the era of big data,the increasing proliferation of micro-blog data,high-speed data processing has been an enormous challenges.Therefore,how fast mining important information and understand the dynamics of public opinion from a large number of Microblog information timely,which has a of great reality significance.Facing that massive Microblog data brings huge challenges to public opinion analysis,this paper makes analysis and research on the microblogging social network by introducing the Hadoop technology into the analysis of public opinion and combining the big data processing technology and the public opinion analysis technology.The main research work as follow:Firstly,this paper introduces the source,development and related processing technology of big data and public opinion analysis,analyzes the three core components of Hadoop2.0: distributed file system HDFS,distributed computing model MapReduce and resource management system YARN,deeply research the key techniques and implementation methods of Microblog information acquisition,data preprocessing,text clustering and public opinion analysis,which are the key technologies of Microblog social network public opinion analysis.Secondly,combing big data processing technology and public opinion analysis technology,this paper analyzes and implement parallelization of each stage of public opinion based on Hadoop platform,by the MapReduce programming model,this paper proposed an optimization mechanism for K-means parallel algorithm,and on this basis,a new clustering algorithm is proposed,which is a new K-means algorithm based on cosine distance.By judging and adjusting cosine distance based on the different range of size,the new algorithm can improve the clustering results and the clustering quality,which is more suitable for Microblog data.Finally,in the comparative analysis of the experiment,the Hadoop cluster is built on the workstation,the pre-processing of the Microblog data is realized on the Hadoop/ Mahout platform.The experiment comparing the traditional K-means algorithm and the improved K-means algorithm based on the MapReduce programming model.The experimental results show that the improved clustering algorithm improves the accuracy and recall rate,which has a better clustering quality and good scalability.The end of the experiment completed the hot topic found and emotional tendencies analysis of Microblog.
Keywords/Search Tags:Big Data, public opinion analysis, K-means, Hadoop, Mahout
PDF Full Text Request
Related items