Font Size: a A A

Statistics And Analysis Of Online Public Opinion Based On Information Extraction From Webpage

Posted on:2017-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2348330491950957Subject:Statistics
Abstract/Summary:PDF Full Text Request
Since 1994 China officially connected to the Internet,Chinese netizens scale showed a trend of sustained and rapid growth year by year.According to December 2015 statistics show that the scale of China's Internet users has reached 6.88 million,over half of the Chinese people has begun to use the Internet.The Internet as a instead of traditional information in the form of paper writing material spread,become the main way of mass ideology,culture,information access and communication.and because of its transmission speed,spread to a wide range,the characteristics of the participants more,make it become the main gathering place of public opinion.Say from big field has shocked the world of the north Korean nuclear problem,tianjin big bombings shocked the whole country,small side also has a famous actor falls at the Oscar awards ceremony,a university launched Fried corn raisins cuisine,etc.,and dissemination of network public opinion is more and more affects the hearts of the people of the whole society.How to get information from a large number of network public opinion information,so as to quickly grasp the present situation of public opinion,to predict public opinion direction,timely and correctly guide public opinion,to promote healthy and stable development of society has very important significance.The research idea of this article is produced in this background,the specific research contents are as follows:(1)Firstly,introduce the theoretical knowledge which is used in the research process of this paper,such as Web information extraction method,text representation method,data dimension reduction method,clustering method and so on.(2)For network information(With sina Micro-blog data as an example),mainly through the web crawler to get a large amount of information required to climb.In this paper,we use the news crawling system which developed and put into use by enterprises,The system can be suitable for different web structure of the DOM parsing template configuration,so as to facilitate the rapid data crawling.(3)According to the characteristics of short text,puts forward the way of dealing with the standardization and pertinence.Contains the emoticons,forward link,punctuation marks,and image preprocessing etc.With the help of network corpus and the way of manual annotation,to cut the word after the pretreatment of data processing,this paper uses the tool of Chinese word segmentation is the R language Rwordseg package.(4)In order to make the experiment data to achieve ideal effect of clustering,this paper puts forward a kind of a FCM clustering algorithm based on word association is proposed,and the practical operability of the algorithm is verified;In terms of public opinion analysis,this paper USES the study emotion orientation method based on support vector machine(SVM),and an example is given.
Keywords/Search Tags:Information Extraction, Clustering analysis, term correlation relationship, Analysis of Public Opinion
PDF Full Text Request
Related items