Research And Implementation Of Data Classification Algorithm Based On Decision Tree

Posted on:2017-12-28

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Liu

Full Text:PDF

GTID:2348330518495806

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the popularity and mature of mobile Internet,cloud computing technology and Internet of things,there are vast amounts of data everyday,the data sources are also complex and diverse.From these big data,how to get the information which is good for business and person is an important issue we need to face today.Classification is an important data mining task,it provides a solid foundation for the subsequent clustering,correlation analysis and other tasks.So data classification in data mining technology has important research value.This paper introduces the concepts,processes,and the development of mining technology,and it also analyzes the data classification task carefully.As one of the classic classification algorithm,SPRINT algorithm has been widely used in today.On the basis of presentation SPRINT algorithm and problems,the paper improves the method of looking for the best split point.For discrete and continuous attribute,article puts forward the two new data structure of classification table and merge partition table to reduce unnecessary operation and the number of candidate nodes.These improvements can shorten the time of constructing decision tree and optimize the overall performance of the algorithm.However,when the traditional data classification algorithm faces with large data sets,their computing and storage capacity can't achieve the ideal effect.The rise and development of cloud computing technology provides an opportunity to solve this problem,its high flexibility,high scalability,low resistance and high reliability of cluster resources provides the underlying convenient services for data mining.So the article combines the data classification with cloud computing technology.Based on the analysis of large-scale data classification processing demand,the paper proposes data classification model based on Hadoop platform.However,the paper combines the Hadoop framework and data classification technology by proposing the model's needs,basic structure and function module.The paper also improves the algorithm tier of system and optimizes the SPRINT algorithm by sort parallelism,node parallelism and property parallelism.These improvement make the SPRINT algorithm transplant to Hadoop platform perfectly.Finally,the paper tests the efficiency of the improved SPRINT algorithm by setting up platform.It proved that the improved algorithm can effectively reduce the data processing time and improve overall system performance,so that the system can be as high concurrency,low-cost,highly reliable complete data classification tasks.

Keywords/Search Tags:

data mining, cloud computing, Hadoop, data classification, SPRINT algorithm

PDF Full Text Request

Related items

1	The Reseach Of Data Mining Based On HADOOP
2	The Process And Research Of Massive Data Mining Based On Cloud Computing
3	Research On Decision Tree Mining Algorithm Based On Cloud Computing
4	The Parallel Reseach On Decision Tree Classification Algorithm Based On Hadoop
5	Data Mining Association Algorithm Research And Realization Based On Cloud Computing
6	The Study Of Decision Tree Algorithm Based On Hadoop Platform
7	Research On Data Mining Algorithm In Cloud Computing Environment
8	Parallel Data Mining Algorithm Research In Cloud
9	Search Of Classification Algorithms For Data Mining
10	Research On Decision Tree Classification Algorithm Based On Hadoop