A Method Of Chinese Text Classification Based On The Expansion Of VSM

Posted on:2011-08-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Jing

Full Text:PDF

GTID:2178330332460435

Subject:Signal and Information Processing

Abstract/Summary:

As internet develops with a rapid speed, texts as its main resources are increasing quickly. How to effectively organize and manage information, and how to fast, accurately and comprehensively find the useful information are important issues in the field of information science and technology. Text Classification as key technologies for text data organizing and processing, can solve the problem in much extent, so as to help people accurately position and efficiently diverse information. Therefore it has broad application prospects.The most widely used model for Automatic Text Categorization is the vector space model. Usually characteristical words are used to build a vector space model as features. Early studies are based on knowledge-based engineering methods, and feature items are determined by artificial rules; with me development of statistical machine learning theory and statistical natural language processing technology, machine learning methods are applied to determine the feature items, and have achieved good results. However, due to the training corpus resources and training time constraints, machine learning has limitations. Many feature items contributing to topic determination are not available through the conventional machine learning method. Text classification will not achieve satisfactory results with vector space model generated by such feature vectors.So the vector space model needs to re-construct.In this paper, a method of Chinese text classification based on the expansion of VSM is proposed. The features of each type of texts are analyzed, and then with the help of Hownet, sememes which are most closely related to the theme are abstracted. These sememes are used to expand feature items. Combined with the synonym table, the feature expansion set is generated and each expansion term is given proper weight to present its description power. Finally, we use the expansion set to classify texts. This article focuses on how to extract characteristics, how to set appropriate weights to expansion terms, and how to construct a new VSM. Experimental results show that this method increases the effective number of features, so that both of the classification accuracy and stability are improved. Finally, a summary of the thesis and outlook are made, pointing out what needs to research and improve in future.

Keywords/Search Tags:

text classification, VSM, Hownet, sememe

Related items

1	Research And Implementation Of Text Similarity Computing Based On HowNet Sememe Space
2	The Text Similarity Study Base On Hownet
3	Research On Ontology-Based Semantic Text Categorization
4	Automatic Classification Based On The Concept Of The Text
5	The Research Of HowNet Based Word Similarity Computation And Its Application
6	Research Of Hownet Based Word Semantic Computation And Application
7	Research On Knowledge-guided Sememe Prediction
8	The Research On Conducting Chemical Domain Text Classifier Based On Hownet
9	Research On Deep Learning Text Classification Method Based On HowNet
10	Social Media Text Classification Based On Syntax Knowledge And Sememe Knowledge