Font Size: a A A

Research On Key Technologies Of Parallel Heuristic Reduction Method

Posted on:2016-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:X YuFull Text:PDF
GTID:2308330470472049Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Now, as the era of information expansion has been coming, processing and analysis of massive data,which has explosive growth and increasingly rich types, puts forward severe challenges to plenty of large-scale enterprises. Apparently, the present data mining platform that consists of a single node has not adapt to the massive data storage. The emergence of cloud computing provides a solution to handle large data. As a open-source cloud computing platform, Hadoop is the most widely used nowadays. The distributed file system HDFS and its programming mode MapReduce is the core technology of Hadoop.For imprecise or missing information data, traditional methods in data mining have been unable to meet the requirements. Howerer, as a new soft computing method for dealing with uncertain, imprecise or inconsistent data, Rough set has been widely applied in data mining, pattern recognition, decision support and analysis. It is for the reason that it does not require a priori knowledge and can give a accurate and objective description for the uncertain when processing with rough set theory. Therefore, knowledge reduction with rough set theroy in distributed platform are combined to improve the efficiency of data mining, which has become a hot research direction of the current.Most of the existing parallel knowledge reduction method is aimed at attribute reduction, little at value reduction.As well, the current methods of parallel attribute reduction may obtain inaccurate reduction results. As a result, In order to ensure the correct reducts as well as realize comprehensive knowledge reduction, this paper proposed a parallel heuristic reduction algorithm based on Hadoop distributed platform, which includes parallel heuristic attribute reduction algorithm and parallel heuristic reduction algorithm and realize parallelization by using MapReduce. It was verified that the method proposed in this paper has the feasibility in practical application by being applied to the fault diagnosis of distribution network. In addition, through different size of data set and different PC numbers of the processing cluster, three indicators of the parallel algorithm, which means SpeedUp, ScaleUp and SizeUp were used to evaluate the method in this paper.the results show that the method has higher efficiency in the processing of big data.
Keywords/Search Tags:Rough Set theroy, Big data, Parallel reduction algorithm, MapReduce
PDF Full Text Request
Related items