Font Size: a A A

Research On Data Collection And Extraction Of Emergency Online Public Opinion

Posted on:2013-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2248330395480645Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the cyber space has become the first platform forpeople to exchange their opinion and release their emotion, on account of its features such asinteractive, real-time and openness. Faced to the exponential incensement of the information onthe Web, how to find and get the online public opinion data fast and precisely from voluminousinformation on the Web has become a difficult issue. The massive information of Internet andsome significant features of emergency, such as hard prediction, rapid spead and wide influence,have brought a new challenge for traditional technology of relational data acquisition. Thetechnology of emergency online public opinion collection and extraction is able to get abundantWeb pages about emergency online public opinion, and extract the data of public opinionautomatically. This technology also has important meaning for the monitor and the pre-warningof emergency online public opinion.This dissertation studies data collection and extraction of emergency online public opinion,including emergency online public opinion data collection, online public opinion carrier typerecognition, Web forum review extraction, and some major contributions are listed as follows:(1) In order to acquire the emergency online public opinion data of the monitored Web siteprecisely in time, a sitemap generalization method based on the emergency hot degree isproposed. The method firstly generates the sitemap containing the emergency hot degree ofevery board that is related to emergency in the monitored Web sites; and then, under theguidance of the sitemaps, optimizes and updates the task queue appropriately; finally collects theneeded Web pages precisely and fast without too much delay. Experiments show that the Webclawer with the help of the sitemap can adjust update frequency automatically, collects theneeded Web pages promptly, and adapts well for the dynamical changes of the monitored Website.(2) In order to solve the problem that current Web pages classification is unable toautomatically recognize the type of online public opinion carrier, a automatical genrerecognization method of Web pages based on integral feature is presented. The method utilizesrelative frequency difference and recursive feature elimination to achieve the feature selection incontent feature and structure feature separately and then extracts the web feature, content featureand structure feature of the web pages to represent the online public opinion carrier andconstructs a feature set; finally, utilizes the SVM classifier to classify the data. The experimentalresults show that the method has improved the accuracy in the genre recognition of web pagesand outperformed the traditional method.(3) In order to overcome the difficulty that the data has random length and is heavily noisedin Web information extraction, a emergency online public opinion data extraction method of Webforum pages is presented. The method generates the structure-tree template through three steps:automatical data region discovery, noise removal and review boundary partition. Then, the Webforum reviews are extracted using generated template well. On the foundation of operationmentioned above, we utilize some statistical information and rules to extract the data of public opinion from the reviews. Our experimental results indicate that the proposed method achieve abetter performance in recall, accuracy and efficiency than the existing solutions.
Keywords/Search Tags:Emergency, Online Public Opinion, Sitemap, Web Crawler, Web Genre, AutomaticRecognition, Web Information Extraction
PDF Full Text Request
Related items