Font Size: a A A

A Mass Short Text-oriented System Design And Implement Of Public Opinion

Posted on:2013-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Q JiaFull Text:PDF
GTID:2248330371466306Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and communications technology, network and information security problems have become increasingly prominent. Meanwhile public opinion monitoring over the network has become particularly important. Public opinion monitoring has become of a great concern to all levels of government. Nowadays all kinds of public opinion monitoring systems are introduced, most of which based on Internet news and BBS posts. This subject aims at developing a public opinion monitoring system solving the mass of short text public situation, which is based on SMS data according to the project requirements. The main work and innovations made by the author are as follows:First of all, this paper proposed a series of improved algorithms specially for the short text analysis. This paper has a different goal with the traditional public opinion monitoring system which base on the Internet news or BBS posts. It focus on the processing of massive short texts. Short text contains only a few words, the traditional algorithms doesn’t work well on the short text according to our experiments. This paper improved the traditional public opinion monitoring algorithms to meet the need of specified project functional requirements such as simplified Bayesian classification and hot topic detecting. This paper also introduced the Chinese variants detecting algorithm to the public opinion analyzing system and achieved good results.In the second place, this paper roposed a parallel processing framework for specific system. The system is required to do real-time analysis of massive messages, which means high level system performance. This paper presents a balanced distribution strategy in order to perform parallel processing, which make the system performance scalable. The overall system depends on the public opinion analyzing algorithms that can be split and be calculated paralleled which made the parallel processing possible. The system performance meet the demand through the experiments.On the third hand, this paper proposed a set of distributed parallel storage solutions. For the massive information storage problem for short text and the middle results, this paper presents a solution based on MongoDB distributed file system. MongoDB’s key-value-based mechanism provides advantages over traditional relational database access performance. This paper designed a table split solution for the program, making data access more convenient, take up less space, which is the foundation of high performance of the entire system.
Keywords/Search Tags:public opinion, short text, mass data, parallel processing
PDF Full Text Request
Related items