Font Size: a A A

Design And Implementation Of Network Public Opinion Analysis System Based On Microblogging

Posted on:2018-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiFull Text:PDF
GTID:2348330512488251Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the Internet + era,with a variety of social media,the exchange between people is more and more convenient,and the cost of comunication is getting lower and lower.Whether it is national events,social conditions and public opinion or star gossip,these information is discussed and exchanged actively between Internet users and then spreads rapidly.The huge social media user community generates a huge amount of network data.How can people discover valuable hot topics and capture the development trend of network public opinion in these unstructured,dynamic and massive information timely is one of the hot spots in today's natural language processing.In recent years,with the number of microblogging users continues to grow,microblogging's influence cannot be overlooked anymore.Therefore,microblogging was selected as the object of this thesis.In order to analyze the microblogging data,the web crawler program was used by this thesis to obtain microblogging data.Although the microblogging text contains a wealth of social topics but the data has its own characteristics.The results of microblogging topic detection are often unsatisfactory by using traditional methods.The text representation model and text clustering algorithm in topic detection process were the key research contents of this thesis.Considering the particularity of the microblogging text,the method of updating the word vector of the word2 vec was improved in this thesis,and then the improved word2 vec and TF-IDF(Term FrequencyInverse Document Frequency,TF-IDF)were combined to design and implement a text representation model named Improved-word2 vec & TF-IDF.The microblogging data was maped to a fixed dimension of the text vector by using this text representation method,which effectively solved the problem that the traditional text representation model maps the vector high dimension sparseness and ignores the semantic similarity.Experiments showed that the clustering accuracy was 19.62% higher by using this text representation method than that of VSM.Two defects of the classical Single-pass algorithm were improved in this thesis,and then the improved Single-pass clustering algorithm was combined with Hierarchical Agglomerative Clustering(HAC)for the design and implementation of a microblogging topic detection clustering algorithm named Improved-SP&HAC.This algorithm is applied to topic detection task.Improved-SP& HAC algorithm has two steps.Firstly,the improved Single-pass algorithm is used to cluster microblogging data quickly,and the time efficiency of topic detection can be improved through this method.Secondly,the clustering algorithm is used to re-cluster the initial results to improve the accuracy of topic detection.Few comparison experiments proved that the Improved-SP&HAC algorithm had taken into account the efficiency and quality,so it has more more advantage than the traditional clustering algorithm in the the public opinion analysis.In this thesis,a set of architecture of microblogging public opinion analysis system was designed too,and then it was implemented by Python Django framework.Test proved that the performance of this system was stable,and it could assist users to analysis the microblogging public opinion.
Keywords/Search Tags:analysis of network public opinion, word2vec, Single-pass&HAC, text clustering, topic detection
PDF Full Text Request
Related items