Font Size: a A A

Studies On Feature Selection Method Based On Heuristic Attribute Reduction Of Rough Set

Posted on:2012-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z F WangFull Text:PDF
GTID:2218330338456957Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with internet increasingly mature and its application gradually expanded in recent years, network resources which exist in text form have grown sharply. In the face of such massive information, people have "lost" in enormous information. Therefore, it's urgent to classify information according to their contents.Since American scholar H.P.Luhn firstly studied automatic categorization, text categorization has drawn more and more scholars' attention. A lot of research achievements about text categorization have obtained and text categorization has been successfully applied in search engine, information filtering, digital library, mail classification, and so on. As the important part of text categorization, feature selection has a large extent effect on text categorization. Therefore, it's urgent to find high-efficiency feature selection algorithm to reduce the dimension of feature sets. And it has been one of the important research subjects in text categorization.Based on the theory of rough set, this paper firstly finds that rough set has advantage in feature reduction and presents feasibility analysis of applying rough sets in feature selection. Secondly, focusing on the weakness of dealing with problems of inconsistent decision table and time complexity, this paper proposes a heuristic attribute reduction feature selection algorithm based on rough set. With applying this algorithm in feature selection, it can not only improve efficiency of text categorization, but also bring new contents to the research of feature selection. Finally, based on the research of the improved feature selection algorithm, this paper compares this algorithm with other feature selection algorithm by doing lots of experments. The experiment results show that the algorithm proposed in this paper can greatly reducing the dimension of feature sets and obtain better categorization results.Based on rough set, this paper discusses the problems existing in text categorization feature selection, and studies deeply the heuristic attribute reduction feature selection algorithm. The main work of this paper is as following:1 Discusses the purpose of this paper, introduces some basic conceptions of rough set, studies the important factors that can influence text categorization, analyzes some characters of different feature selection methods, and sets forth common feature selection method based on rough set;2 For searching more efficient feature selection method to reduce the dimension of feature sets, this paper tries to apply heuristic attribute reduction algorithm to feature selection after introducing the text catergorization based on rough set. In consistent decision table, this paper proposes an improved positive domain heuristic attribute reduction feature selection algorithm to reduce the dimension of feature sets; In inconsistent decision table, after introducing the conception of granularity function, which can be used to measure the diversity of different attribute sets, this paper gives heuristic attribute reduction feature selection algorithm based on granularity function. All of these researches provide new research directions for text catergorization feature selection;3 By doing some experiments with lab corpus, this paper illustrates the effectiveness of the decision rules of categorization. The experiment results show that this algoritm can not only better reduce the dimension of feature sets, but also greatly improve the efficiency of categorization. All these prove that it's practicable to apply heuristic attribute reduction method based on rough set to feature selection.Finally, this paper summarizes the research of text categorization feature selection, and for some problems to be perfected in this paper, some thoughts of the further work are presented...
Keywords/Search Tags:text categorization, feature selection, rough set, decision table, heuristic attribute reduction
PDF Full Text Request
Related items