Font Size: a A A

Research On The Filtering Method Of Uyghur Adverse Text Information

Posted on:2015-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2298330431492003Subject:Electrical theory and new technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network, We obtain the information become more andmore convenient through internet. Because of the openness, interactivity, and globalcharacteristics of the network information,which make all kinds of information spread goodand bad. Especially the spreading out of a large number of bad information,like the violence、the reaction and the violence,undermined society’s ethos badly,and always brought aboutvery big troubles to users。So how to avoid the intrusion of bad information, to ensure thehealth and safety of network information has become an important research question.Xinjiang is a predominantly Uighur multi-ethnic-populated areas, as the Uighur nativelanguage, the Uyghur is more and more widely used in the internet multimedia data, thenumber of Uyghur website and video are increasing. At present, English and Chinese filteringtechnology has been mature, but the Uyghur filtering technology is still in its infancy.This paper focuses on Uyghur text information processing in the internet, through thestudy of the main ideas of information filtering technology and combined with thecharacteristics of the Uyghur language words, discussing the network filtering technologyresearch in adverse text information of Uyghur language. In this paper, starting from the textinformation filtering technology, on the premise of the theory fully preparation, designs andrealizes a adverse text filtering system based on the Uyghur.This system mainly includes: based on the webpage text keyword filtering, and based ontext classification method filtering. In keyword filtering, the main work includes: extractingwebpage text, the analysis of the Uyghur language in the way of storage, then match thekeywords with extracted text by Aho-Corasick algorithm, Complete the search and replacefor keywords in a short time,meet the requirements of timeless. In text classification methodfiltering, considering the efficiency and stability of Bayesian algorithm, selects the Bayesianmethod to build a classifier, this part of the main work includes: Uyghur text preprocessing,feature selection and classifier construction, etc. Considering the na ve Bayes is based on the independence of the hypothesis cannot satisfy actual conditions, proposed a naive Bayesmethod based on feature weighting improvement, by constructing three weight adjustmentfactor to adjust for TFIDF, modified with different contribution feature item on theclassification and improve the accuracy of classification.
Keywords/Search Tags:Text Filtering, Uyghur Text, Naive Bayes, Feature Weighting, Keyword Filtering
PDF Full Text Request
Related items