Font Size: a A A

Design And Implementation Of The Data Analysis System Besed On Storm

Posted on:2015-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:C H SunFull Text:PDF
GTID:2298330467963358Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, big data processing has been a trend. Only relying on big data technology to dig the potential value of the data can we improve decisions making in business. Therefore, big data processing technologies has become a research hotspot. Hadoop and some other cloud computing technologies make batch processing possible to process big data. However, people’s real-time requirement and the personalized requirements are more demanding than ever before,so the Storm appeared. And now Storm’s research is still in its infancy.This article compares Storm to other similar technologies especially streaming technologis to show their advantages of each technology and their suitable application scenarios. This paper attempts to study related questions about Storm and gives relevant proposals for improvement. This paper proposes a novel method to build topologies by combining Spring with Storm. Meanwhile, this paper proposes a system to process data and realize the K-Means clustering algorithm in Storm. The practice results is shown and analyzed to prove the effectiveness of the system and model, at the same time it proves that Storm can improve the clustering algorithm processing speed.To sum up, this paper carried out the following aspects:First, analyze and compare the advantages and disadvantages about the popular big data processing technology, especially stream processing technology. Secondly, sum up the basic thoughts about storm performance optimization. Thirdly, using distributed message queue Kafka as Storm’s spout, solving the problem of Storm’s data source can not be parallelized. Fourth, by combining Spring with Storm, providing a new approach and a unified configuration model to build topologies for using Storm. Fifth, realizing the K-Means algorithm’s parallelism in Storm in order to analyse the degree of aggregation. Finally, design a fully distributed data processing system based on the Storm, and tested and verified by GPS data.
Keywords/Search Tags:stream processing, storm, kafka, spring, K-means
PDF Full Text Request
Related items