The Research Of Tibetan Text Categorization Base On N-gram Information

Posted on:2011-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhou

Full Text:PDF

GTID:2198330332469982

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the coming of information age, information resources manifest an explosive growth throughout internet. For the sake of not being submerged by mass of information, good measures should be taken to category and manage them. Text categorization plays an important role on how to use that information. It can effectively organize and manage information so that the efficiency of information retrieval will be improved.Research on text classification and its related technologies are done in the paper, and this paper proposes an approach for Tibetan text categorization without word segmentation. Compared with traditional text classification models, the approach based on character-level N ? Gram language modeling avoids word segmentation so that much computing resources during pre-processing will be saved.This paper firstly introduces the recent research situation of text categorization all over the world; secondly the commonly used text express models are analyzed and N-Gram model for Tibetan text is further studied, while the selection of parameter N and every function of Tibetan text categorization system are discussed. Thirdly, core function classifier of the system is presented in details in Chapter Four. The paper also proposes a corpus Naive Bayes Multinomial, which combines tightly with N ? Gram and makes good classifying effect. Finally, because of a large number of overlapped high-degree bigrams and biased high-degree bigrams in bigram feature set we put forward a novel feature reduction method,Î´? OR,which can raise theÎ´? degreeoverlapped bigrams to corresponding trigrams. The experiment shows thatÎ´? OR method can not only achieves feature reduction as well deletes redundant information but also improves abilities to describe features and to classify features. In some degree of reduction, the categorization effect can be reasonably made better.

Keywords/Search Tags:

text classification, N - Grammodel, corpus, Naive Bayes Multinomial

PDF Full Text Request

Related items

1	The Research Of Tibetan Text Categorization Base On N-gram Information
2	Research On Improved Multinomial Naive Bayes Text Classification Algorithms
3	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian
4	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
5	Tibetan Text Calssification Technology Research On Native Bayes
6	Research On Text Classification Algorithm Based On Naive Bayes Method
7	Text Categorization Based On Naive Bayes Method
8	A Text Classifier About High Blood Pressure Based On Naive Bayes
9	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
10	The Study Of Naive Bayes Text Classification System Based On Artificial Intelligence