Font Size: a A A

Research And Implementation Of Text Categorization System Based On VSM

Posted on:2006-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2168360152991873Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text Categorization adopts some techniques of Natural Language Understanding. So we firstly describe some basic concept of Natural Language Understanding and the background of Text Categorization. And systematically discusses the relationship of Text Categorization, Information Retrieval and Information Extraction. It deeply studies the related theory and technology that can realize Chinese Text Categorization. And it gives a research to several key techniques about Text Categorization, including Vector Space Model, Feature Extraction, Machine Learning. Then considering the systematic recall, precision and feasibility.Then this paper puts forward a realized project of Chinese Text automatic Categorization and describes a text categorization model based on VSM. That the realization of Segmentation and Syntactic Analysis of text by with of Chinese phrase analysis system—ICTCLAS and the Syntactic Parser base on PCFG -PROP of Chinese Academy of Sciences, it makes the index item extracted from the text to have more probability trendy towards the focus word. Thus improve the systematic precision and recall. The practice to some functions of the whole system has been carried on using JAVA technology, In the practice, it improves the categorization algorithm and put forward the concept of Threshold. It enhances the categorization function of the system. After realization, it gives the evaluations and results.Finally, because the precision of categorization is not ideal, the next contents of this subject are summarized systematically and some suggestion.
Keywords/Search Tags:Information Extraction, Information Retrieval, Syntactic Parsing, Chinese Word Segmentation, Threshold, Text Categorization, Chinese Information Processing, Vector Space Model
PDF Full Text Request
Related items