The Research And Application Of Harmful Text Filtering Technology Based On Na(?)ve Bayes Algorithm

Posted on:2019-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:W Zhao

Full Text:PDF

GTID:2428330563995866

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology,network has become the main information source for individuals and enterprises.These rich and variety information resources bring convenience to people,however,they are also filled with a lot of harmful information,such as reactionary,pornography,drugs,gambling,and products advertising for illegal marketing.There harmful information not only hinders the construction of green and healthy Internet environment,but also hampers the process of obtaining information.As text information has a large proportion in the network information,the research on harmful text filtering technology is of significance application meaning in cleaning up the whole of network information.Thus,useful text information can be obtained quickly and effectively.Based on the Na(?)ve Bayes algorithm with Vector Space Model(VSM),a harmful text filtering technology for large number of mobile network information is proposed,and the methods and models contained in it are investigated and improved.Finally,harmful text filtering for the specified system is implemented.The main research work and contributions of this dissertation are as following:(1)Taking VSM as the text representation method,the set of category central vector is defined by improving the feature selection method.And through optimizing the model of the Na(?)ve Bayes algorithm,the classification algorithm for text filtering is trained,which lays the foundation for the follow-up technology.(2)Based on the Na(?)ve Bayes algorithm,a harmful text filtering technology,which introduces the hypothesis testing idea,is proposed to filter the harmful text.First of all,the Ansj Chinese text segmentation method is used for Chinese segmentation.Then,the Na(?)ve Bayes classification algorithm,which is based on VSM,is combined with harmful text filtering.Finally,the set of classificatory threshold is applied to achieve the filtering of harmful text.(3)The web crawler is written in the Java language.Using the Jsoup open source HTML parser for analyzing the web page structure of each designated website,the grasping of corpus information is realized.Accordingly,the information of application system is analyzed to screen corpus.Finally,the final corpus is formed.(4)The Eclipse is used to development the test platform of harmful text filtering technology based on the Na(?)ve Bayes algorithm.The feasibility of the filtering technology is verified by a set of basic tests,and through three sets of comparative tests,the effectiveness of other improvement in this technology are further proved.

Keywords/Search Tags:

Harmful text filtering, Vector Space Model(VSM), Na(?)ve Bayes, Feature selection, Web crawler

PDF Full Text Request

Related items

1	Chinese Text Data Classification
2	Research Of Adaptive Text Filtering System Based On Vector Space Model
3	Research And Implementation Of Text Classification Technology Based On Bayesian Theory
4	Research For Information Filtering Technology Based On Text Content
5	Information Filtering Systems Based On Web Text Content And Design,
6	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
7	The Researching Of Content Secarity Forbid Systems
8	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
9	Text Filtering Key Technologies
10	Research On Feature Selection Of Text Classification