Font Size: a A A

Research And Implementation Of City One-Card Data Processing System Based On Lambda Architecture

Posted on:2018-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330518498946Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
This topic comes from a Beijing wisdom Park project.With the concept of smart city,more and more intelligent services have been developed to provide a lot of convenience.City smart card is a part of the smart city,and people's lives are closely related to the system.In order to analysis the large number of daily date of cards using,we have to focus on data analysis and management.A data analysis system can find abnormal and give the warning timely,will be stolen or lost card account permissions to freeze and freeze,reduce the loss of users,data analysis system that is necessary for the wisdom of the city card system.This paper is about research and Realization of the system of city card data analysis,real-time analysis of credit card data,abnormity detection card,provide one-stop solution for data anomaly detection card.Due to the large population of the city,the behavior of credit card is more frequent,and the amount of data generated by the user is unable to deal with the traditional data processing methods.Data analysis system can identify a short period of time frequent credit card,the amount of consumption is too large,too many credit card failure,credit card behavior does not meet the abnormal situation.The system uses Lambda architecture,using Spark memory computing framework as a starting point,the model is trained using historical data of credit card,credit card data processing by flow real-time framework Spark Streaming call model.Credit card data transmitted from the terminal to the credit card business card system through Web Service,after pretreatment to be processed by the Kafka Spark Streaming publish subscribe system,read from the Kafka after the pretreatment of data analysis.The preprocessing part of the data is divided into different time periods of the day,the user's credit card data for each period of time statistics and feature extraction.The process of data analysis adopts the idea of clustering,and clustering algorithm plays an important role in anomaly detection.In this paper,according to the characteristics of the card data,the clustering model is divided into two categories: public model and private model.More public model using the characteristic value of the model training,all users share the same model;private model using a small number of features can best embody the value of credit card behavior characteristics of users clustering,in order to guarantee the conciseness of the model,each user has a private model.The public model is based on the abundant characteristic value of abnormal obvious outliers were screened out;the private model is based on each user's credit card used to judge whether the abnormal credit card data,combined with the judgment results of two models,the card is integrated to determine the abnormal state.The part of the experiment is simulated with the historical data of Beijing wisdom park a credit card,through public model and private model we need on the historical data of the system training,used to detect abnormal data,prediction accuracy rate of 80% or more.Because of the reason of data model iteration,with the increase of the running time of the system,the accuracy of prediction is improved steadily.During the peak hours of credit card,the system can process the data stored in the Kafka cluster,and the stability and real-time performance are high,and the expected results are achieved.
Keywords/Search Tags:Anomaly detection, Lambda architecture, Clustering, Machine learning, Spark, Big data distributed system
PDF Full Text Request
Related items