Font Size: a A A

An Automatic Chinese Text Categorization System Based On Statistical Language Model

Posted on:2007-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:W MaoFull Text:PDF
GTID:2178360185968206Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the advent of information age, information manifests an explosive growth throughout Internet. For the sake of making advantage of abundant source of information rather than loss and bewildered, certain measures should be taken to refine the classified management. When the specialists fail to fulfill the increasing impossible task, the automatic text categorization technology emerges as the times require. Along with our national process of infomationization, it is also necessary to develop Chinese tailoring applications. This paper presents some research on automatic Chinese text categorization system (CATCS).This dissertation focuses on following issues:1. Chinese text expression based on N-gram model. The pap, compares the major text express models, which involves discussions concerning parameter selection of N-gram model, smoothing algorithm as well as feature extraction and so on.2. The architecture of CATCS. The functions of CATCS are presented, among which the detailed description of core function classifier is also given. The paper also proposes a chain naive...
Keywords/Search Tags:N-gram model, na(?)ve bayes, auto-text categorization, corpus
PDF Full Text Request
Related items