Font Size: a A A

The Research On Classification Algorithm And Concept Drift Based On Stream Data In Big Data

Posted on:2017-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:H Y TanFull Text:PDF
GTID:2348330491950330Subject:Information networks
Abstract/Summary:PDF Full Text Request
With the development of technologies such as cloud computing and internet of things, streaming data as a new form of big data exists widely in telecommunications, the Internet, finance and other fields. Due to the characteristics of streaming data such as Real-time, disorder, and unlimited,the traditional data query and mining method is no longer applicable to the streaming data, the classification algorithm based on big data and its concept drift become an urgent problem.The existing study of classification algorithms and its concept drift mainly depends on the data structure and algorithm optimization. The calculation and drift detection work is completed by independent computer which is limited resources. With the deeply research on big data and the emergence of distributed computing framework, the traditional algorithms employ the distributed computing platform on the data mining becomes a hot spot.Therefore, the paper respectively proposes two mining algorithm and the corresponding system is designed on Storm for mutant concept drift and gradual concept drift. The S-CVFDT algorithm predicts the gradual concept drift by the parallel window. Once the S-CVFDT system discovers the gradual concept drift in data flow, it will adaptively chang the parallel window size and update the model. The experimental results show that S-CVFDT algorithm can effectively detect the gradual concept drift and reduce the waste of the resource. Meanwhile the S-CVFDT system is better than the CVFDT system in efficiency and accuracy of model. With the development of streaming media, video data is soaring increasing which is hardly got all by one time. According to the characteristics of video data such as continuous, changable and limitlessness, the paper presents MCVFDT algorithm to resist the mutant concept drift in streaming media.That is to say the MCVFDT algorithm is mainly used for predicting the hot spots in server cache. It benefits the Qo S of media service. Meanwhile it is helpful for the migration of data in servers and dynamically balance the servers load. Finally, the traditional computing platform is stand-alone mode so that the performance of algorithms are limited. The paper presents the system is implemented on the Storm to realize the parallel computing. The experimental results show that it will enhance the ability of attributes computing so that promote the efficiency of classification when it copies with big data.
Keywords/Search Tags:big data, data mining, classification algorithm, concept drift
PDF Full Text Request
Related items