Font Size: a A A

Security Filtering Objected To Illegal Text

Posted on:2010-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2178360275454924Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer and communication technology,people come into the information society.Internet has taken increasingly position in people's daily information exchange,but web pages may contain numbers of unhealthy contents, including reactionary,violence,feudal superstitions and such the negative information, making people eagerly looking forward to a variety of functions which can effectively prevent the abuse of interference when they bowering the internet,thus the network content monitoring and content filtering gradually become a hot research.In this paper,web page filtering for a specific theme was researched.An analysis on both theme-oriented content filtering and the characteristics of Web page content features were done.Besides,the design of web content filtering based on CLSI (Classified Latent Semantic Indexing) has been completed;the main ones are as follows:1) In the pre-processing for the web page,this paper studied the features of terms in theme information.We only pro cess the word with the tag of nouns,verbs and adjectives,replacing the traditional steps to remove the step word,and add a features table to preserve the result of main features as a dictionary of the theme,after the filtering process,the page can directly follow the table for word processing so that it can improve the operating efficiency and filtration accuracy of the system.2) We researched the relationship between the content and it's location with text labels in the web page.And made a detail study on a number of web label and define a weight for them.In the process of word extraction we take the weight of label into consideration.3) Latent Semantic Model(LSI) had solve the problem that traditional text filtering model only do some statistical for word,reflects a good semantic structure of the entire set of documents,but it is not distinguish the good and bad message of subject for Anti-learning.In this paper,we designed a classified latent semantic indexing (CLSI) model,CLSI using the positive and negative information of the theme when extracting the main features.Finally,research was done on the Windows platform for the filtration system both in the filter effects and the performance,which has received satisfactory data support.
Keywords/Search Tags:Web Content Filtering, Classified latent semantic indexing model, Theme Filtering, feature extraction, Web page feature
PDF Full Text Request
Related items