Font Size: a A A

Public Opinion Mining On The Internet

Posted on:2008-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:A N DuFull Text:PDF
GTID:1118360245497365Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
To dominate and lead the public opinion is one of important acts of maintaining social stability and Party ruling security. With the rapid expansion of information technology, Internet become the main platform of information releasing, exchanging and acquiring with a huge number of users. Instead of public opinion survey, public opinion mining on the Internet become more and more important. Public opinion is the aggregate of individual attitudes or beliefs held by the adult population in some area in a period. As a method to collect public opinion, public opinion mining on the Internet becomes the researching focus. However, problems of existing public opinion mining techniques on huge-volume processing, high-speed mining and high-accuracy pre-alarm call for improvements in public opinion architecture and mining algorithms.This thesis focus on the Internet public opinion mining techniques. After clarifying the notion of public opinion and relating concepts, this paper mainly studies the architecture of Internet public opinion mining and mining algorithms on different periods of public opinion information forming. The main contents are as follows:Research on the architecture of Internet public opinion information mining is quite important. This thesis proposed four-level architecture of Attribute Level, Information Collecting Level, Mining Level and Disposing Level. Among them, Attribute Level includes basic rules in public opinion collecting, catching, tracking and leading; Information Collecting Level includes what is collected, where to collect and how to collect; Mining Level includes three-phase public opinion forming model of Releasing, Acquiring and Citation, and mining algorithms on different mining phases; Proposing Level includes evaluating, analyzing and proposing methods. The four-level architecture is the base of Internet public opinion mining.During the Releasing phase, we monitored content-suspicious pages to fulfil the use of harmful information filtering and suspicious information monitoring. This thesis proposed the notion of User Interest Focusing Degree (UIFD), which use how the set of interest constituted to measure the user interest. Thus user interest is regarded as an informal continuum With different UIFD around the objects user interested in. This thesis implemented the UIFD-based Chinese web pages filtering approach on public opinion, which includes pages structure analyzer of URL, title, body and machine learning algorithm with UIFD imported into the training procedure. UTFD-based Filtering algorithms earns high efficiency in Chinese content-suspicious web pages filtering.During the Acquiring phase, we timely maintained the list of frequently accessed news topic on the Internet, to get the hot topic in time and avoid them transforming to unexpected affairs. This thesis put forward frequent items maintaining algorithm of Frequent Sketch (FS), which keeps the deficient synopsis by maintaining a sorted doubly-linked list of groups storing the frequency delta in between and pruning the counters periodically. Compared with existing algorithms, FS acts better in accuracy, processing speed and memory used. Frequently accessed news topic mining approach on FS-Win algorithm (FS expanded to windowed stream) and topic similarity algorithm, can acquire frequently accessed news topic in time.During the Citation phase, we measure the spreading degree of news topics, to help user comprehend current public opinion broadcasting situation, find out what hot topic and people's attitude is. This thesis introduced a measurement model of Internetpublic opinion-----NISAC indexes. Similar to the compiling methods of economicalindexes and natural indexes, NISAC indexes are compiled based on the number of web pages which contain certain keyword. NISAC indexes can help describe the public opinion situation quantificationally, understand the spreading degree of hot topic. We can acquire unexpected affairs of abnormal spreading degree by monitoring the indexes of certain keyword contained in affairs relating pages. In a word, NISAC indexes are used to monitor, evaluate and pre-alarm the social security situation reflected on the Internet.
Keywords/Search Tags:web mining, public opinion mining, web filtering, frequent items maintaining, public opinion measurement
PDF Full Text Request
Related items