Research On Content - Based Web Text Information Filtering Technology

Posted on:2016-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:P Y Jin

Full Text:PDF

GTID:2208330470452881

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the openness and largeness of network, the size is bigger and bigger. It is convenient for people to exchange information freely, while there are also many negative effects, such as superstition, pornography, violence, reactionary and other illegal information even letting out internal confidential information. They have become the focus of problems. In order to block these illegal and harmful information, people proposed all kinds of automatic extractions and filtering technologies, such as IP address filter, keyword filtering, intelligent content understanding and had a good effect in practical application. In this study, we hope to use a new edit distant algorithm to analysis the text of webpages, in order to achieve faster and more accurately to network information security filtering.Newly the method based on statistical or knowledge is commonly used in analyzing and mining the content of text.In this paper, we studied those methods, and put forward a new way to analyzing the similarity of the strings and patterns for the text. By this way, we can find the required information. First based on the request we find the pattern from the sentences of sample, and build the pattern database. Then we find the sentences which are the similar to the pattern. Based on the weight of keywords which determined by users we can decide whether it should be filtered or not. In this algorithm, we considered the features and statistical of text. Using a special extended edit distance to binding up the text and patterns which are fit to something to analyzing mining and filtering the content of webpage. After the preliminary test, the way we put forward reached a good result.Preliminary experiments show that the proposed way that using the extended edit distance can identify the bad information. And reach a well result in matching the text content of webpage and filtering ill information. It is a complicated way to find the meaning of sentences and make text filtering intelligence.In this paper we just try to propose a new way to mining and filtering the information of text, and there are many problems to be improved, such as the accuracy of word segmentation and the precision of matching between sentences and pattern. Even we can import more semantic analysis to improve the accuracy of the filter.

Keywords/Search Tags:

Text information filtering, Natural language processing, Extended editdistance, Filter determine

PDF Full Text Request

Related items

1	Research On Text Information Filtering Technologies Based On Semantic Orientation Analysis
2	Text Filtering Key Technologies
3	In View Of The Short Carrier Natural Language Text Information Hiding Technology Research And Implementation
4	Research On Natural Language Watermarking Based On Syntactic Transformations
5	Research And Application On Method Of Generating SQL Through Natural Language Based On Interactive Information Editing
6	Research On Code Retrieval Technology Based On Extended Query And Natural Language Processing
7	Research And Implementation Of Natural Language Information Hiding Algorithm Based On Abstract Embedding Unit
8	Research On Text Classification Based On Natural Language Processing And Machine Learning
9	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
10	Research And Application Of Text Classification Based On Natural Language Processing