Font Size: a A A

Design And Implementation Of Real-time Hot Goods Analysis System Based On Storm

Posted on:2019-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:C G MaFull Text:PDF
GTID:2428330569496102Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,there are more and more merchandise spike activities in electronic store,such as: Xiaomi's mobile phone spike at 12 o'clock per week,Taobao's double 11-second spike and Jingdong's 618 product spike,etc.,and almost every e-commerce site has a commodity spike activity.Commodity spike systems are so common,but they also face various problems: data isolation,high concurrent requests,single account and multiple account multiple requests,data consistency,real-time hot spot discovery,etc.Commodity spike systems typically deploy separate Cache clusters for hotspot data and separate servers with higher bandwidth to isolate hotspot data from common data in order to prevent 1% of hotspot data from affecting 99% of normal data.Despite this,real-time hotspot data may still appear from 99% of normal data.Because the system cannot know in advance which ordinary data may become real-time hotspot data,it cannot be protected in advance.Therefore,real-time analysis based on system data is needed to discover hot-spot product data generated in real-time as soon as possible,and then make corresponding adjustments immediately to effectively ensure the high availability of the system.This article focuses on real-time hot spot product data discovery problems in spike scenarios.It does not involve system tuning issues after hot spot discovery.Since the generation of hotspot data is caused by user behavior,such as purchase,browse,share,search,etc.,this article will analyze the user behavior log in the spike system in real time,and mainly perform the following tasks:1.Introduce related technologies used by the system.Use the Flume framework for distributed log collection;use Kafka as the log message queue to prevent the log data from being collected too quickly and the computing module processing data is too slow to cause data loss;use Storm for real-time data processing to ensure real-time data analysis;use MySQL Database persistent storage of data;use Redis memory database for rapid data access.2.Design hotspots sorting algorithm based on multi-dimensional sorting.The multidimensional sorting design principle is integrated by combining multiple attribute rankings and attribute weights.The system can sort the hot product data according to the individual user behavior attributes,but the comprehensive results of the single attribute sorting results are poor,and there is a lot of chance.This article compares the growth of various attributes under normal scenes and spike scenarios to determine the weight of each behavior attribute,and then combines the ranking of the commodity in a single attribute ranking to obtain a comprehensive score for the product,and finally sorts all the product scores comprehensively.3.Design and implement a real-time hot commodity analysis system based on Storm.The system integrates Flume,Kafka,Storm,MySQL,and Redis technologies,which can effectively monitor log files and read log data in a timely manner.The logic of the sort algorithm is written to the data processing of the Storm and can be quickly analyzed by analyzing the log data.Real-time hotspot data;storing data in MySQL for persistence,can be used for off-line processing of subsequent massive data.
Keywords/Search Tags:Spike System, Storm, Real-time Processing, Multi-dimensional Sorting
PDF Full Text Request
Related items