Font Size: a A A

Research On Content - Based Web Text Information Filtering Technology

Posted on:2016-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:P Y JinFull Text:PDF
GTID:2208330470452881Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the openness and largeness of network, the size is bigger and bigger. It is convenient for people to exchange information freely, while there are also many negative effects, such as superstition, pornography, violence, reactionary and other illegal information even letting out internal confidential information. They have become the focus of problems. In order to block these illegal and harmful information, people proposed all kinds of automatic extractions and filtering technologies, such as IP address filter, keyword filtering, intelligent content understanding and had a good effect in practical application. In this study, we hope to use a new edit distant algorithm to analysis the text of webpages, in order to achieve faster and more accurately to network information security filtering.Newly the method based on statistical or knowledge is commonly used in analyzing and mining the content of text.In this paper, we studied those methods, and put forward a new way to analyzing the similarity of the strings and patterns for the text. By this way, we can find the required information. First based on the request we find the pattern from the sentences of sample, and build the pattern database. Then we find the sentences which are the similar to the pattern. Based on the weight of keywords which determined by users we can decide whether it should be filtered or not. In this algorithm, we considered the features and statistical of text. Using a special extended edit distance to binding up the text and patterns which are fit to something to analyzing mining and filtering the content of webpage. After the preliminary test, the way we put forward reached a good result.Preliminary experiments show that the proposed way that using the extended edit distance can identify the bad information. And reach a well result in matching the text content of webpage and filtering ill information. It is a complicated way to find the meaning of sentences and make text filtering intelligence.In this paper we just try to propose a new way to mining and filtering the information of text, and there are many problems to be improved, such as the accuracy of word segmentation and the precision of matching between sentences and pattern. Even we can import more semantic analysis to improve the accuracy of the filter.
Keywords/Search Tags:Text information filtering, Natural language processing, Extended editdistance, Filter determine
PDF Full Text Request
Related items