Font Size: a A A

Research On The Inappropriate Web Filtering Technology

Posted on:2013-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2248330392452272Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,information onthe network reveals the explosive growth. There is not only more andmore information, but also kinds of information. The Internet hasbecome an important carrier of information and communicationchannels. And it is increasingly integrated into people’s daily learning,working and life. In these the existence of large amounts of information,there is not only beneficial information to the user, but also a widevariety of useless information, even some of the harmful information.For researchers, how to filter harmful information on the page bothaccurately and quickly is crucial.On the Internet, the text is the most important part of web page. Italmost exists in every page. And it is the important part to expressinformation. We can read the text to understand the main content of thispage. Therefore the study of text filtering is very important. The focusof this study is the text filtering.Facing the disadvantage of single layer text filtering used today, adouble layer text filtering is introduced to get rid of web content on theharmful information in this paper. Health information is commended tothe user and harmful information will be filtered. We use keywordmatching approach in the first layer of the text filtering system. This cangreatly reduce the number of pages which will go to the second layer ofthe text filtering system. Thereby it can increase the speed of the textfiltering. We use the text classification methods in the second layer offiltering systems to judge on the first layer filtering system to achieveacceptable accuracy and precision. The paper designs a set of useful web page text filtering systemthrough research and analysis on the main web filtering method. And itis enable to meet the actual needs of the current web page text filtering.In this paper, a keyword-based text feature extraction method iscommended. At selecting feature each time, we select the featureswhich are important to text categorization. We select those featureswhich appear often than others the in each type. We commend a methodwhich is based on the X2. It overcomes two problems in the original X2method. In the experiment, we can get a good result.In the paper, a new text extraction method which is based on theDOM tree structure is commended. Express every web page to a DOMtree structure, compute text density of each node to find out the mainblock of this web page and get rid of useless information blocks.In this paper, basic on the BM algorithm, a pattern matchingmethod called BM2C algorithm is commended. It can reduce theright-motion time and the waste time of matching through using theadvantage of BMH and BMHS.
Keywords/Search Tags:Information Filtering, Text Classification, FeatureExtraction, Text Content Extraction, Pattern matching
PDF Full Text Request
Related items