Font Size: a A A

Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian

Posted on:2016-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Almuqati abdulmohsen naif aFull Text:PDF
GTID:2348330476955783Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, Internet has been one of the most mainly-used information source. how to quickly find the information they need in a number of information resources has become a serious problem.Most of the information is text data on the Internet, and automatic text classification can effectively organize and manage text data, and therefore has important significance and application value.The paper analyzes the main process of Chinese text classification, including text pre-process, text representation, feature selection, algorithm of classification performance evaluation. In the text representation, the paper focuses on the vector space model, in the feature selection, the paper introductions the mutual information, information gain, Chi-square and other commonly used feature selection methods, the algorithm of classification is the core of the classification system, this paper introduces the Decision Tree algorithm, K- Nearest Neighbor, Naive Bayesian, Support Vector Machine.The paper mainly analyzes the K-Nearest Neighbor and Naive Bayesian models, and utilizes C++ on VS2010 to implement a Chinese text categorization system based on K-Nearest Neighbor, Multi-variant Bernoulli, and polynomial models. In order to improve the efficiency and accuracy of the classification systems, in the paper we also adopted the DF-based feature selection method. Finally, a Chinese text classification system is constructed by integrating all above mentioned techniques and sub-modules.The experimental result demonstrate that polynomial model performance best compared with other two classification models on the training corpus.
Keywords/Search Tags:Text Categorization, Feature Selection, KNN, Naive Bayes classification, Multi-variety Bernoulli, Model Multinomial Model
PDF Full Text Request
Related items