Font Size: a A A

Research On Text Feature Extraction Based On A Method Named CM-RS

Posted on:2013-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:A L TuFull Text:PDF
GTID:2248330371993165Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text feature extraction refers to extracts the messages which can represent a category or itself from the text. The purpose of the feature extraction is filtering noise characteristics, selecting the optimal feature subset to optimize text representation and reducing the dimension of the text data in order to improve the text representation of the class separability. The feature extraction is a basic and an important problem in text mining and information retrieval.Feature extraction technology can be divided into tow classes:feature selection and feature extraction. According to certain standards, we can select a part of feature words from the original feature set, and then form a subset which can be used as the new feature set. This method is called feature selection. The advantage of feature selection is easy to understand and a small amount of calculation. Its main shortcoming is that, assuming that all features are mutually independent; it can not effectively solve the near-synonym confusion and polysemy ambiguity characteristics on classification accuracy influence. Through specific mapping function, we can do some transformation based on original space such as rotation, extension and shrinking and so on, and then reconstruct the new feature. This method is called feature extraction. Feature extraction avoids the assumption which each characteristic of feature selection is independent of each other. It considers the relation among the characteristics and emphasize on understanding the text content of feature extraction. However, this method requires do a variety of mapping and transformation in high-dimensional original feature space. It leads to high time complexity, and reduces the efficiency of the algorithm.This paper proposes a method of feature extraction named CM-RS. This method uses qualitative and quantitative transformation model--cloud model proposed by Li Deyi academicians preliminary screening of the original feature space. And then it uses the RS semantic analysis model extract features from the screened feature space. Cloud model feature extraction can improve the processing efficiency of the RS semantic analysis model. RS semantic model build the correlation between the characteristic words and similarity, and consider the feature extraction to avoid synonyms and polysemy in text features. Text feature extraction of the text first need to scientific abstraction. It converts an original text which is unstructured into a structured text that it can be identified by the computer. And then it establishes the mathematical model of the text. Computer does calculations and operations on this model to achieve identification of the text. This article uses the characteristics distribution matrix which is based on mutual information converting the text into structured form processing. Based on this, we use cloud model for feature selection and RS semantic analysis model for feature extraction. In this paper, we use the CM-RS feature extraction into text classification experiments. The results show that the method of combing cloud model feature selection and RS feature extraction has significant effect on improving text classification accuracy and reducing time complexity.
Keywords/Search Tags:Text Feature Extraction, Cloud Model Feature Selection, RS FeatureExtraction, Genetic Algorithm
PDF Full Text Request
Related items