Font Size: a A A

Research And Development Of Internet Public Opinion Monitoring Model Based On Web Crawler And Lucene Index

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhouFull Text:PDF
GTID:2248330395997147Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology, the government andenterprises is pay more attention on the use of IT technology to virtual network public opinionmonitoring. Network security management is the core issue of public security, emergencymanagement and network public opinion has a close relation. Nearly ten years, with thepopularity of information technology, the explosion of information content, from vastamounts of data discovery and handling emergencies information network is more and moreimportant and difficult. And the requirements timeliness of emergency response is very high,which required to take immediate measures, while the traditional way of collection andanalysis has been difficult to meet the needs of the real-time, thus set up a virtual socialemergency management command system is necessary. The system not only can find events,but also can analyze the complex relationship between events, describe and predict thedevelopment trend of events.By2012, according to authoritative organization investigation, China’s Internetpopulation has reached at500million; domestic Internet penetration rate reached38.3%,among them, there are350million mobile Internet users. The number of participants Internetactivity significantly increased obviously. Today, after the television, radio, newspaper, theInternet is called "The fourth media". Now, with a steady stream of Internet users participatein, instead, the Internet has become a barometer of public opinion. Mainly reflected in thenews website, well-known blog BBS, post bar, such as platform, this type of media is alsoreferred to as virtual society. Due to network regulation is not strict, even flawed, basic nothreshold, to participate activities cost nearly zero, but its influence is more extensive, deeperpenetration, cause the social impact of nots allow to ignore. If its development, not be directed,so a lot of negative Internet public opinion information are full of the virtual community,which will certainly to affect the social stability and security is buried under the social hiddentrouble. For government agencies, and strengthen the virtual social public opinion supervision,and resolving the crisis, to maintain social stability, the realization of the modernization construction of our country, the economic development forward has very important practicalsignificance.The Internet is a treasure, especially in the era of big data, with the aid of IT technology,realization of virtual network public opinion in a timely and comprehensive monitoring hasbeen imminent. In this paper, we will mainly introduces the structure of the Internet publicopinion Monitoring and how the Web Crawler and the Lucene index used in the applicationof the Internet public opinion monitoring system.In this paper, the Internet public opinion monitoring system constitute by Informationacquisition module, Information retrieval module, data analysis module and data displayingmodule. The core of the Information search module is the crawler, It can Crawl data fromnews websites, BBS, blog and micro blog websites and video websites. The informationretrieval module is used for a fast and accurate retrieval for big data, here the Lucene indextake up to5seconds. Finally we will also introduce data analysis module and data displayingmodule, respectively used to analyze the semantics of the text and the final data show.Web crawler, also known as a Spider spiders, or network robot, BOT, etc, all these aredoesn’t matter, the most important thing are: as a result of the existence of the crawler, makessearch engine has a wealth of resources. Using a search engine, the ability to enable us toretrieve information received an unprecedented increase, effectively reduce the cost, so tospeak, search engine is the core of computer technology, Internet technology with traditionalindex theory combining the successful model. Along with the network popularization, itsgrowing influence, information rapid growth, the network, no doubt, has become the largestcarrier of the information today. Search engine to help us to achieve from the mass of theInternet to get information about the effective way. But the network world is complex,diversified, but users access to data is always in purpose, the whole virtual society orienteduniversal search engine more and more highlights its limitations, how to ask a user based onthe theme of the rapid, accurate and in-depth queries, is a difficult problem in front of us. Webcrawler as a core component of search engine, naturally became a main direction of research,in the back of a powerful search engine, there is a highly effective web crawler to service it.We will introduced another key technology in this paper, the Lucene index, an efficient dataretrieval tool, which will play an indispensable role in the public opinion monitoring system.
Keywords/Search Tags:Internet public opinion Monitoring, Web crawler, Lucene
PDF Full Text Request
Related items