Font Size: a A A

Design And Implementation Of System For Attribute Selection Based On Rough Set

Posted on:2016-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:2308330482951143Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In many fields of information processing, it is very important to remove the redundant and irrelevant attributes of a data set, because these attributes will affect the accuracy of data analysis and processing efficiency. With the unceasing renewal of information acquisition equipment and the diversification of acquisition means, the size of the data we get becomes very large. Although there are many effective methods for the attribute selection currently, these methods is somewhat weak when processing such large-scale data, for the large occupation on calculation resource and the unbearable computing time.In order to implement the attribute selection for the large-scale data in a more efficient and effective way, an algorithm which meets that requirements was proposed in this paper. The algorithm can process the large-scale data fast using sampling method and principles of positive region preserved in rough set theory and get the result of attribute reduction with quite high quality. By a proposed measure method for the quality of the result with the index of discriminatory ability, detailed analysis has been carried out on the data with different structures, and the results shows that the proposed algorithm is a kind of attribute selection algorithm which is independent of the number of instance. In order to prove the reliability of the algorithm proposed in this paper, by comparing with a traditional algorithm on the the calculation time needed by the large-scale data,analysis on the experiment has been conducted and results show that by using this algorithm, within a few minutes or even several seconds, the results of attributes selection for large-scale data is available with high quality. In addition, as an extension of the proposed algorithm, a discretization algorithm based on the hold of the generalized decision has been described in this paper.In order to use the methods for data processing proposed in this paper more convenient in practical application, all these methods have been embedded into the system that is involved in this paper, as a part of system implement. For a better reflection of the comprehensiveness of the system, the system has done the classification of sparse and non sparse data types, respectively, designed and implemented the absolute and relative reduction for the two data types. An efficient feature selection algorithm proposed by others is also implemented, for the purpose of a more diversity system. In addition, the system also included the discretization algorithm designed in this paper which is indispensable to the process of attribute selection sometime.Research about comparison and validation for algorithm are mainly completed with C++ in Linux system, and the system interface implementation and insertion for algorithms are completed under the integrated environment of Visual Studio 2012 with MFC in the Windows 8 system. The whole system is easy to use, fully functional, scalable with a friendly interface, and especially suitable for attribute selection for the large-scale data.
Keywords/Search Tags:Large-scale Data, Attribute Selection, Rough Set
PDF Full Text Request
Related items