Font Size: a A A

Research On Text Feature Reduction Based On Dependency Relation

Posted on:2013-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:T HuFull Text:PDF
GTID:2268330398998795Subject:Information Science
Abstract/Summary:PDF Full Text Request
As the Internet continues to grow, there are more and more information on the network. To face the vast amounts of information, it is often difficult to choose. By text classification, it is able to effectively organize and manage text information. Text classification could improve the efficiency of learning and work. Feature dimension reduction is an very important step in text classification. The methods of normal feature dimension reduction, mainly based on statistical thinking. The items, be selected by these methods, exist a lot of noise and a higher feature dimension, without considering the semantic relationships between words. This paper is from the semantic point of view for feature dimension reduction, and analyzes the sentence by dependency relation. Each sentence of the key words is selected as the feature item. Feature items selected by this method can explain a text better.First of all, the paper introduces the recent situation about text classification, analyzes the existing defect of feature dimension reduction method. Then the basic concepts of text classification and knowledge is summarized, and the characteristics and application of text classification is described; analysis of the text classification process, including the text pre-processing, calculating weights of the items, text representation and classification algorithms. This paper makes a detailed study on feature dimension reduction, including feature selection and feature extraction, and analyzes common methods of feature dimension reduction. Then the feature items are studied, focusing on the word co-occurrence analysis. Make a study on the dependency relation and analysis of the dependence of the characteristics of the word co-occurrence. Dependence relation as a characteristic word of the items contain more semantic information, more independence between feature items and express the text better. On this basis was proposed a method of feature dimension reduction based on the dependency relation. In order to verify its effect, to make a text classification experiment comparison between document frequency methods, mutual information and information gain method. Experiments show that the method has a certain degree of feasibility, and there are still some defects. Research the dependency words, and compare them with items selected by methods of traditional feature dimension reduction. Analysis from the perspective of semantic and properties of the word, improve the method for the defect and propose a improved method based on dependency relation. The improved method embodies semantic information about feature items, and reduces the items of the sparsity of effects on text classification. Experiments show that improved method of classification performance is better than the previous one.
Keywords/Search Tags:text classification, dimension reduction, dependency relation, semantic
PDF Full Text Request
Related items