Font Size: a A A

Design And Implementation Of Web Sensitivity Analysis System For Public Opinion

Posted on:2019-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2428330566467156Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of network and the popularization of the Internet,the size of Internet users has become larger and larger,websites in various fields have also appeared one after another,which are mainly large portals,government,news,trading,college websites and so on,which cover all fields,such as economy,politics,culture,education,etc.From the country to the government,from business to individuals,from cities to villages,the way to obtain information and publish information has shifted from other ways to the use of the Internet,which benefits from the rapid spread of the Internet and rarely affected by time or distance.Especially from a personal point of view,the popularity of mobile phones and computers has led everyone into the Internet,enabled everyone to access a variety of information on the Internet anytime and anywhere,and users enjoy the freedom of speech relatively on the Internet.In this Internet environment,once some bad information comes into the Internet,it will spread on the Internet at an extremely fast pace.When the scale of the Internet users who has read the bad information reach a certain level,which will inevitably lead to strong discussions in the society,and even more serious will cause social instability and endanger the national security.This kind of thing also happens frequently.Such as the large-scale anti-Japanese demonstrations in 2012 throughout the country,various rumors after the 2008 Wenchuan earthquake.Therefore,it is necessary to effectively control bad information on the Internet.This paper prevents bad information from entering the Internet by determining the sensitivity of the web page text.At present,sensitive word library and sensitive word level library is not well-developed,and there is very little research on web page sensitivity.This article collects sensitive libraries that exist on the Internet,based on the summary of these sensitive libraries,each sensitive word has been marked with a certain level of sensitivity according to certain criteria,thus building a sensitive word level library.In the process of designing the system,three algorithms mainly used are namely the improved matching algorithm based on AC algorithm,the webpage text extraction algorithm and the webpage text sensitivity analysis algorithm based on sensitive density,the system contains a total of six modules: database management,crawler design,web page body extraction,sensitive word detection,web page sensitivity analysis,and data page display,the system implements the function of webpage web crawling,webpage text extraction,webpage sensitive words detection,and webpage sensitivity calculation.it can effectively present bad information in the Internet through this system.
Keywords/Search Tags:Web crawler, Text extraction, Sensitive words, Web sensitivity
PDF Full Text Request
Related items