Font Size: a A A

Research On Webpage Structured Information Extraction Algorithms In Micro-Blog Public Opinion Analysis Systems

Posted on:2015-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhaoFull Text:PDF
GTID:2298330467462062Subject:Information security
Abstract/Summary:PDF Full Text Request
Micro-blog is a platform to share, spread and acquire information based on user relationship. As one of the most popular social tools today, micro-blog has brought huge convenience to us. But at the same time, it’s becoming a hotbed for false information to breed and flood. As a result, it’s very important for the government to monitor the public opinions in the micro-blog. To act global and effective analysis to micro-blog which has become an important source of public opinions, we have to obtain micro-blogs from different websites simultaneously, and gain structured information like the author, text, comment count, forward count of each micro-blog.In order to archive this goal, this paper proposed a unified micro-blog web page structured information extraction method based on hierarchical clustering. The method could acquire structured information from web pages of any micro-blog website gained by web crawlers without making use of API provided by service providers. This method could lay the foundation of realizing global analysis of public opinions from different websites.What this paper has accomplished is as following.1) This paper firstly studied indicators we want to get and the system architecture of micro-blog public opinion analysis systems. Later, this paper put forward problems to be solved by the micro-blog webpage extraction module.2) Based on the work mentioned above, this paper proposed a unified micro-blog web page structured information extraction method based on hierarchical clustering. The method took unique features of the DOM tree of Micro-blog web pages into consideration, in order to overcome problems such as huge amount of computation and extraction result of low accuracy brought by Web extraction methods which are widely used. The method could extract structured information from micro-blog web page in an effective and accurate way.3) This paper conducted an experiment of extracting micro-blog web pages of some popular service providers by the method proposed in this paper and tried to use the method in an experimental system of micro-blog public opinion analysis. Results of those experiments indicated the high accuracy of the method and that the method would perfectly meet the request of the micro-blog webpage extraction module in the micro-blog public opinion analysis system.
Keywords/Search Tags:micro-blog, public opinion, extraction, structuredinformation, hierarchical clustering
PDF Full Text Request
Related items