Font Size: a A A

Research On Key Techniques Of Uyghur Text Filtering

Posted on:2016-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:L Q A L M S YaFull Text:PDF
GTID:2308330476450371Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the become more powerful of Internet technology, the network has become the first channel to obtain information. As a participant in the exchange of information and sponsors, people are exposed to thousands of information resources in their daily lives through different media. How to constraint information sources on the internet in an effective way to ensure the effectiveness and safety of Internet information is become a research hotspot in recent years. Xinjiang is a multi-minority area, while native Uyghur occupy an important position. With the rapid development of other language-related text retrieval technology, Uyghur text filtering techniques have to keep up the speed of development of the Information Age.Level of knowledge in this paper is more extensive, combined with natural language understanding, the field of machine learning, statistics and other content. Firstly Introduces the technical aspects of the text filtering technology, focusing on Uyghur text preprocessing technology with the Uyghur writing unique expression syntax. Introduced several feature selection in the preprocessing step with comparative analysis. In Uyghur stemming work achieved the Uighur stem matching algorithms based on mechanical extraction. To make its shortcomings, in-depth study of the classic Porter algorithm, through the establishment of a certain size of Uyghur affix combination rule and achieved the Uyghur noun stemming algorithm based on the Porter stemming algorithm. On Uyghur text filtering problem, for the lack of classical machine learning KNN algorithm, put forward the improved ideas, and based on the core aspects of KNN algorithm, constructed the Uyghur text filtering model based on the VSM. In the using of Bayesian algorithms, in order to solve the lack of performance of the document semantics, introduces the concept of text generation, establishing its XML tree to keeping the original semantics of documents,,make sure the performance of Bayes algorithm. Finally, with the combination of theory and related technologies, design and implement the Uyghur text filtering system, and through a number of sample sets of tests confirmed the effectiveness and feasibility of the proposed method.
Keywords/Search Tags:Text filtering, Uyghur text, KNN, Bayes, XML
PDF Full Text Request
Related items