Font Size: a A A

Research On Some Key Technologies Of Data Mining On Rough Set In Grid

Posted on:2012-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q S XiaFull Text:PDF
GTID:2218330338963067Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining which being focused on widely by academic and industrial field, it is a method of finding useful knowledge from mass data and one of a hot point in the research area of international database and information decision. However, with the augment of data size and the distribution of data location, tradional computing model cannot satisfy practical requirement. Grid has the characteristic of resource shareing and cooperative processing, and provides excellent analysis and computing platform for massive and distributed data. In the following chapters, based on grid services, the key technologies in grid data mining are researched, which includ mass data partition, function mining and so on. The main contributions are shown as follows:(1) Mass Data Partition for Rough Set on Attribute Reduction (OAR-RSBSA) which is based upon existing algorithms is proposed. The efficiency of the division algorithm is improved about 70%, and the algorithm can make a goodapplication in the relevant aspects of the data grid.(2) In order to find quickly attribution reduction of sample data, Optimum Attribution Reduction on Rough Set and Binary Search Algorithm (OAR-RSBSA) is proposed. Meanwhile, Distributed Function Mining on Rough Set, GEP and Binary Search in Grid (DFMRSGBS) is present, which combines grid services and attribution reduction to solve merger of local data model by idea of function consistency. The simulation experiments results show that OAR-RSBSA is faster than traditional algorithms on solving optimum attribution reduction, and average consumptive time of DFMRSGBS is less than GEP and parallel GEPSA. With increment of grid nodes, global fitting error of DFMRSGBS is decrease apparently.(3) Service-oriented grid data mining architecture is present. On the basis of the architecture, the content and relationship between the each functional module is discussed in detail.(4) A grid data mining system (GDMS) is designed and accomplished on Eclipse. The design of all the major function in GDMS and inplemention of grid portal are introduced in detail. grid portal of GDMS on Eclipse/SWT technology is implemented so as to be convient for users to remote execute distributed data mining.
Keywords/Search Tags:Grid, Distributed data mining, Gene expression programming, Function mining, Rough set, Attribution reduction
PDF Full Text Request
Related items