The Research Of Decision Tree Mining Based On Hadoop

Posted on:2016-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Zhao

Full Text:PDF

GTID:2308330461967254

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Now, an era of a large-scale production, sharing and application data is opening. The development and application of cloud computing technology bring new opportunities for data mining, The real value of big data, for the most part, are hidden beneath the surface, however, promote cloud computing. Data mining algorithm processing data and mining hidden data value, which is beneficial to the company to make decision on the basis of the data value and conform to social development. But the present data mining algorithm will take a long time to deal with mass data. An effective method to deal with mass data and extract data value is to combine the traditional mining algorithm with the current mature cloud computing technology..Hadoop is an open source of Apache distributed system framework, which is based on Java language development, the core of Hadoop consists of HDFS and MapReduce. The HDFS provides high fault tolerance and high throughput rate of file storage, reading and writing. MapReduce provides a parallel programming framework, the user can develop parallel applications without knowing distributed parallel programming details. Hadoop provides the mass data storage platform and parallel calculation platform, which provides the basis for the traditional data mining algorithm processing mass data.This paper analyzes and researches the key technology of the Hadoop platform which is the HDFS file system and MapReduce programming model on the basis of detailed analysis of current mature cloud computing platform. And then, the paper includes an in-depth study of the current data mining algorithm, especially the mature decision tree classification algorithm. On this basis, combined with the typical decision tree classification algorithm SPRINT algorithm, optimization algorithm IRSPRINT based on RainForest framework is proposed, and put forward the parallel algorithm HIRSPRINT on Hadoop platform. The experimental results show that HIRSPRINT algorithm has the high speed ratio on Hadoop platform to deal with mass data, effectively reduce the SPRINT algorithm to construct the decision tree and process the mass data. In general, it improves the ability of the decision tree algorithm to deal with mass data effectively.

Keywords/Search Tags:

big data, cloud computing, Hadoop, parallel decision tree algorithm

PDF Full Text Request

Related items

1	The Research On Decision Tree Algorithm's Parallelization Based On Hadoop Platform
2	The Parallel Reseach On Decision Tree Classification Algorithm Based On Hadoop
3	Research On Parallel Shared Decision Tree Algorithm Based On Hadoop
4	Research On Parallel Decision Tree Algorithm Based On Hadoop Platform
5	Research On Decision Tree Classification Algorithm Based On Hadoop
6	Research On Decision Tree Mining Algorithm Based On Cloud Computing
7	Research And Implementation Of Big Data Analysis And Mining Technology Based On Hadoop In Telecommunications Industry
8	Parallel Research And Application Of Machine Learning Algorithm Based On Cloud Platform
9	Research And Implementation Of Data Classification Algorithm Based On Decision Tree
10	Decision Tree Classification Algorithm Parallelization And Its Application