Font Size: a A A

Analysis Of Sina Microblog Public Opinion Based On K-means Clustering And TF-IDF

Posted on:2017-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:J C XuFull Text:PDF
GTID:2308330485972262Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays we are in an era of information explosion, human beings are both promoters and producers of information. In the past, the main tools of information transmission were newspapers, television, etc.;but today, people are surrounded by new media. These new media are not only means of communication, more importantly, they have changed the position of the general public in the social opinion. People can put forward views and opinions by the post bar, forum, microblog.The traditional channels of get information are usually newspapers or some of the mainstream social media. These media have very good objectivity, but cannot pass information to the public at the first time. For post bars, forums or microblogs, they can relatively ensure objectivity and also have good real-time performance. Public opinion system arises at the historic moment. It makes use of Web information as input, the the results of information processing reflects the public’s attitude toward various events, which is an important way for the government and the enterprises to understand the public opinion.The public opinion system of microblogs is a relatively popular and practical system. It is based on data from microblogs. After rough processing using big data technology, we use natural language processing and machine learning technology to process the data to get messages hidden in the data. Finally, an analysis is performed on information obtained from previous steps to extract public opinion. The major steps of processing in our system is as follows.(1) Real-time data acquisition with self-written web crawler.(2) Removing repetitive raw text through sim Hash algorithm.(3) Rough processing of text through word segmentation and stop worremoving.(4) clustering the set of text using the improved k- means clustering algorithm.(5) Extracting keywords by feature extraction.(6) Completing the operation of public opinion with the emotional thesaurus.The system is proposed:(1) Improving k-means clustering algorithm by penalty value.(2) Using clustering to divide the topic of microblogs, the feature extraction method is used to get the key words in the topic.The experiment result is ideal, so the method is effective.
Keywords/Search Tags:Web crawler, Public opinion, clustering, NLP
PDF Full Text Request
Related items