Research On The Filtering Method Of Uyghur Adverse Text Information

Posted on:2015-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2298330431492003

Subject:Electrical theory and new technology

Abstract/Summary:

With the rapid development of network, We obtain the information become more andmore convenient through internet. Because of the openness, interactivity, and globalcharacteristics of the network information,which make all kinds of information spread goodand bad. Especially the spreading out of a large number of bad informationï¼Œlike the violenceã€the reaction and the violenceï¼Œundermined societyâ€™s ethos badlyï¼Œand always brought aboutvery big troubles to usersã€‚So how to avoid the intrusion of bad information, to ensure thehealth and safety of network information has become an important research question.Xinjiang is a predominantly Uighur multi-ethnic-populated areas, as the Uighur nativelanguage, the Uyghur is more and more widely used in the internet multimedia data, thenumber of Uyghur website and video are increasing. At present, English and Chinese filteringtechnology has been mature, but the Uyghur filtering technology is still in its infancy.This paper focuses on Uyghur text information processing in the internet, through thestudy of the main ideas of information filtering technology and combined with thecharacteristics of the Uyghur language words, discussing the network filtering technologyresearch in adverse text information of Uyghur language. In this paper, starting from the textinformation filtering technology, on the premise of the theory fully preparation, designs andrealizes a adverse text filtering system based on the Uyghur.This system mainly includes: based on the webpage text keyword filtering, and based ontext classification method filtering. In keyword filtering, the main work includes: extractingwebpage text, the analysis of the Uyghur language in the way of storage, then match thekeywords with extracted text by Aho-Corasick algorithm, Complete the search and replacefor keywords in a short time,meet the requirements of timeless. In text classification methodfiltering, considering the efficiency and stability of Bayesian algorithm, selects the Bayesianmethod to build a classifier, this part of the main work includes: Uyghur text preprocessing,feature selection and classifier construction, etc. Considering the na ve Bayes is based on the independence of the hypothesis cannot satisfy actual conditions, proposed a naive Bayesmethod based on feature weighting improvement, by constructing three weight adjustmentfactor to adjust for TFIDF, modified with different contribution feature item on theclassification and improve the accuracy of classification.

Keywords/Search Tags:

Text Filtering, Uyghur Text, Naive Bayes, Feature Weighting, Keyword Filtering

Related items

1	The Research And Application Of Text Categorization Arithmetic In Spam Filtering
2	Research On Key Techniques Of Uyghur Text Filtering
3	Text Classification Algorithm Research Based On Naive Bayes
4	Research On Text Classification Algorithm Based On Naive Bayes Method
5	Research On Chinese Spam SMS Filtering Method Based On Rough Set And Naive Bayes
6	Improvement Of Navies Bayes Text Classification Algorithm Based On Unbalanced Dataset
7	Rearch On Content-Based Spam Filtering Technology
8	The Research And Application Of Harmful Text Filtering Technology Based On Na(?)ve Bayes Algorithm
9	Text Categorization Based On Naive Bayes Method
10	For The Application Of Bayesian Algorithm In Spam Filtering