Font Size: a A A

Design And Implementation Of Real-time Processing Architecture For Big Data Based On Storm

Posted on:2019-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ZhaoFull Text:PDF
GTID:2428330566497313Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today's society,the mining and use of massive data are becoming more and more frequent.In practical scenarios,real-time data often needs to be processed and analyzed and feedback timely.In the beginning,many enterprises in order to quickly respond to business requirements,the solution is to use Redis news publish and subscribe features,combined with the List,Sorted Set,Hash structure features of data for processing,finally through the socket feedback.This approach relies heavily on Shared memory,and as the amount of data grows,it is clearly not appropriate to rely on machine memory.Therefore,in order to meet the application requirements of high concurrency,big data and high real-time,this paper designs and implements a big data real-time processing architecture in accordance with current situation.This paper relies on the functional requirements of the three subsystems of advertising analysis,promotion analysis and coupon analysis under the theme of marketing analysis in the actual business scenario.According to the data flow,this paper divides the architecture into five layers: message middle layer(data acquisition),infrastructure layer(real-time processing),data storage layer,service layer and application layer.This paper starts the technical selection around the five-layer structure,and finally designs and realizes the low coupling,high expansion and reusable processing architecture.First of all,in the data acquisition phase,the message queue is built based on KAFKA to realize the cache area,so as to avoid data lag loss caused by data blowout growth.Secondly,a flow processing framework was constructed based on Storm,and a distributed data processing network was formed to solve the complex problem of traditional message queue control.Then,My SQL,HBase and Elastic Search are selected to realize combined storage of multiple data sources in consideration of data features and economic costs.Finally,to optimize query efficiency,distributed SQL queries are implemented based on Presto.In this paper,we study the architecture of the after nearly a year of analysis,design,development,debugging and testing of multiple links such as repeated verification,starting in October last year,has gradually replaced to use the online environment and good effect,fully proved its availability,stability and high performance.
Keywords/Search Tags:Big Data, Real-time processing, Storm, KAFKA, ElasticSearch
PDF Full Text Request
Related items