The Research On Classification Algorithm And Concept Drift Based On Stream Data In Big Data

Posted on:2017-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Tan

Full Text:PDF

GTID:2348330491950330

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

With the development of technologies such as cloud computing and internet of things, streaming data as a new form of big data exists widely in telecommunications, the Internet, finance and other fields. Due to the characteristics of streaming data such as Real-time, disorder, and unlimited,the traditional data query and mining method is no longer applicable to the streaming data, the classification algorithm based on big data and its concept drift become an urgent problem.The existing study of classification algorithms and its concept drift mainly depends on the data structure and algorithm optimization. The calculation and drift detection work is completed by independent computer which is limited resources. With the deeply research on big data and the emergence of distributed computing framework, the traditional algorithms employ the distributed computing platform on the data mining becomes a hot spot.Therefore, the paper respectively proposes two mining algorithm and the corresponding system is designed on Storm for mutant concept drift and gradual concept drift. The S-CVFDT algorithm predicts the gradual concept drift by the parallel window. Once the S-CVFDT system discovers the gradual concept drift in data flow, it will adaptively chang the parallel window size and update the model. The experimental results show that S-CVFDT algorithm can effectively detect the gradual concept drift and reduce the waste of the resource. Meanwhile the S-CVFDT system is better than the CVFDT system in efficiency and accuracy of model. With the development of streaming media, video data is soaring increasing which is hardly got all by one time. According to the characteristics of video data such as continuous, changable and limitlessness, the paper presents MCVFDT algorithm to resist the mutant concept drift in streaming media.That is to say the MCVFDT algorithm is mainly used for predicting the hot spots in server cache. It benefits the Qo S of media service. Meanwhile it is helpful for the migration of data in servers and dynamically balance the servers load. Finally, the traditional computing platform is stand-alone mode so that the performance of algorithms are limited. The paper presents the system is implemented on the Storm to realize the parallel computing. The experimental results show that it will enhance the ability of attributes computing so that promote the efficiency of classification when it copies with big data.

Keywords/Search Tags:

big data, data mining, classification algorithm, concept drift

PDF Full Text Request

Related items

1	Research On Classification Algorithm For Conceptual Drift Data Flow
2	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
3	Classification Algorithm For Data Streams With Concept Drift And Its Applications
4	The Research On Classification Algorithm And Concept Drift Based On Stream Data In Big Data
5	Detecting Concept Drift And Classifying Data Streams
6	Study On Data Streams Online Classification Algorithm Of Adapting To The Concept-Drift
7	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
8	Research On Parallel Classification Algorithm Of Streaming Data
9	The Research On Data Streaming Classification Hidden Concept Drift
10	Research On Concept Drift And Noisy In Data Streams Classification