Font Size: a A A

Study On Attribute Reductions Algorithms Based On The Compression Tree Technology

Posted on:2011-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:L Y HuangFull Text:PDF
GTID:2178360305977860Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rough set theory(RST) was proposed by Z.Pawlak, which is a valid mathemati- cal theory to deal with imprecise, uncertain, and vague information. The most significant difference between it and other theory, which deal with imprecise, uncertain question, is it doesn't need to provide any prior information except the data set for dealding with the question, it can analyze and ratiocinate the data directly, found the hidden knowledge and reveal the potential rule from the data set. In recent years, rough ser theory has acquired good achievement, and has been widely applied in many fields such as the pattern recognition, the machine learning, and so on. As a newer tool which analyze and deal with the data, it has more and more attention in academia, the research hotspot is study and application of the efficient algorithms. At present, there are attribute reduction algorithm, decision rule extraction algorithm, neural network algorithm and genetic algorithm relate with rough set, and so on. Attribute reduction is one of the most important study conrents of the rough set theory and application.Up to now, there has been many scholars proposed attribute reduction algorithm based on positive region and Skowron by designing heuristic information. It is a good method, because this idea is intuitive and succinct. However, these attribute reduction algorithms have high cost of storage for a discernibility matrix. So this design method is not of benefit to dealing with the large data ser. What is more, since there are lots of repeat and unnecessary elements in the discernibility matrix, which not only cost a mass of memory space, but also waste plenty of computer time in feature reduction. Therefore, it can enhance the efficiency of the attribute reduction algorithm grately, if it can not generate the repeat and unnecessary elements.In this thesis, we first discuss and analyze the basic conception of the rough set theory, all kind of the existing attribute reduction algorithms. Based on our analyses, we point out the advantages and shortages of these algorithms. To overcome these shortages, firstly, we propose an efficient attribute reduction algorithm based on the Skowron discernibility matrix, using frequency of attribute appearing in the simplified discernibility matrix as heuristic information. Then, by considering the idea of FP (frequent pattern) tree, we propose a novel data structure C_Tree(compressed tree) , and design corresponding algorithm using this novel data structure, illustrate the validity and feasibility of the new algorithm by examples and experiment results. For details, the mainly work are done in the thesis as follows:(1) Summarizing the rough set theory and its development status, introducing the development status of the attribute reduction algorithm and the complete attribute reduction algorithm, analyzing their advantages and shortages. And also intruducing the basic conception of the rough set theory.(2) Based on the existing attribute reduction algorithms based on discernibility matrix, proposing an efficient attribute reduction algorithm based on the Skowron discernibility matrix, using frequency of attribute appearing in the simplified discernibility matrix as heuristic information. The time and space complexties are cut down to O( | C || U |) + O( | C |2 | U / C|) and O( | U |) respectively. And using an example to illostate the effectiveness of the new algorithm.(3) By considering the idea of FP (frequent pattern) tree, proposing a novel data structure C_Tree (compressed tree) , which can not only get rid of the repeat elements, but also can get rid of the unnecessary elements in the discernibility matrix completely. In this way, it can not only reduce the space complexity efficiently, but also enhance the efficiency of attribute reduction algorithm greatly.(4) Designing quick attribute reduction algorithm based on the improved frequent pattern tree. This algorithm can enhance the efficiency grately, without computing lots of the repeat and unnecessary elements in the disscernibility matrix. Analyzing, in the worst condition, the time and space complexties of this algorithm are max{O ( | C || U |), O (| C || U / C || POS C( D ) / C |)} and max{O (| U |, O (| N1 |)} respectively, for a decision table with |U| objects and |C| condition attributes, and for the compressed tree (C_Tree) with |N1| nodes. An example is used to illustrate the validity of the new algorithm. And experiment results proves the feasibility of the new algorithm.(5) Designing a quick complete attribute reduction algorithm based on the compressed tree. This algorithm also can enhance the efficiency grately, without computing lots of the repeat and unnecessary elements in the disscernibility matrix. Analyzing, in the worst condition, the time and space complexties of this algorithm are max{O (| C || U |), O (| C |2 | U / C | | POS C( D )/ C |)} and max{O (| U |, O (| N1 |)} respectively, for a decision table with |U| objects and |C| condition attributes, and for the compressed tree (C_Tree) with |N1| nodes. Illustrating the validity and feasibility of the new algorithm by an example. There are mainly innovative points in this thesis, as follow:(1) Using the frequency of attribute appearing in the simplified discernibility matrix as heuristic information, an efficient attribute reduction algorithm based on the Skowron discernibility matrix is proposed. The time and space complexities are cut down to O( | C || U |) + O( | C |2 | U / C|) and O (| U |) respectively.(2) Point out that the current research of attribute reduction algorithms based on discernibility matrix. They neither delete the repeat different elements produced by different equivalence class, nor the useless different elements. So it will affect the efficiency of the algorithm of attribute reduction(3) According to the questions we propose in this thesis, combining with the idea of FP tree(frequent pattern tree), we have designed a new data structure—compressed tree (C_Tree). Using the compressed tree to store the elements of the discernibility matrix, it can not only delete the repeat elements ganerated by different equivalence class completely, but also delete the unnecessary elements in the discernibility matrix, completely.(4) Using the compressed tree (C_Tree), we design a quick attribute reduction algorithm and an efficient complete attribute reduction algorithm.The new data structure proposed in this thesis is based on FP tree(frequent pattern tree), the new attribute reduction algorithm and new complete attribute reduction algorithm,which are based on the basis of previous work, can solve the existing problems of the current algorithms, and deal with the large databases more effectively. We also show that the proposed method is effective and feasible by the examples and the experimental results.
Keywords/Search Tags:Rough Set, Attribute Reduction, C_Tree, Algorithm Complexity
PDF Full Text Request
Related items