Font Size: a A A

The Grid Design For Processing Objectionable Web Contents

Posted on:2014-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YuFull Text:PDF
GTID:2248330395993989Subject:Information Science
Abstract/Summary:PDF Full Text Request
The Internet is the greatest human resource information database, and is also thelargest human information garbage field. Anyone can publish all kinds of informationon the Internet due to its openness. There is a large number of objectionable Webcontent. Objectionable content not only seriously affects the physical and mentalhealth of the majority of Internet users but also created an extremely negative impacton the steady development of social stability and unity. Therefore, how to effectivelydiscover, identify, and manage these objectionable content so as to build a cleanInternet environment is an urgent problem to be solved.In this paper, the classification of objectionable Web content is studied and thepropagation characterization of objectionable Web content is given. Based on theanalysis of known technique in the field of objectionable content processing bothfrom home and abroad, and focusing on the propagation characteristic ofobjectionable Web content, an objectionable content processing system on the base ofInternet user terminal using related grid technology is built. This system is composedof the Internet user terminals and Internet objectionable content servers. It realized theobjectionable content report, objectionable content identification, objectionablecontent early warning report, reporter rewarding and objectionable content systemmanagement, which provides an accurate and efficient means to the objectionablecontent processing on the Internet.In terms of objectionable content report, a method to obtain the objectionablecontent based on Internet users is presented. This method is based on the Internetusers to obtain valuable objectionable content through the active participation of thevast number of Internet users. The objectionable content report software is developed for this purpose. The Internet users can easily report the suspicious objectionablecontent which includes realm name, IP address, objectionable content abstract,webpage snapshot, and reporting date on Internet by the software.In the part of objectionable content identification on the Internet, a objectionablecontent identification method based on Hash algorithm is put forward. A fast Hashalgorithm is used to obtain the signature of objectionable Web content. This signatureis the unique identification of harmful information, which used to identify whether theobjectionable content is in the objectionable content database or not. Once the newobjectionable content is found, it can give a tip and add the new objectionable contentto the objectionable content database timely. As a result, the problem of processingspeed decreasing caused by large capacity data needed to match is solved effectively.In addition, in order to protect the reporter, the fast Hash algorithm is also used to signthe reporter’s IP address.In the part of the objectionable content early warning, a method to calculate thehazard and the threshold based on the type of objectionable content is given. An alertwill be active to those objectionable content whose frequency exceeds the specifiedthreshold and a further disposal to the objectionable content will be implemented bythe staff.In terms of reward, a rewarding mechanism to the reporters is set by increasingthe number of Internet user star level. The objectionable content reporter whose starlevel both from the number of objectionable content reported and the number ofconfirmed objectionable content exceeds the specified number will be reward so as toencourage the Internet user to actively participate in the reporting of harmfulinformation.In the part of the objectionable content system management, the reportinginformation management, user management, user rights management and logmanagement are provided in order to modify, query and other operations on the aboveinformation conveniently for the operator. At the same time, it also facilitates themaintenance and use of the system.
Keywords/Search Tags:Objectionable Web content, Identification of objectionable content, Gridtechnology, Hash algorithm
PDF Full Text Request
Related items