Font Size: a A A

Design And Implementation Of A Cloud Based Customer Retention System For China Unicom

Posted on:2019-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z WeiFull Text:PDF
GTID:2428330590975230Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and Internet technology,computing and storage function and use open source software development system to improve or replace existing systems become a trend,like Chinese Unicom was also involved in the traditional industry,this is also a trend of technological change.China Unicom's customer retention system is a subsystem of China Unicom's centralized service support system.It is based on customer information,detailed customer bills,customer payment records,and other consumption data of the China Unicom database for statistical analysis,and establishes a set of customer loyalty.The statistical model of degree,satisfaction,and credit,and then formed a variety of possible reports that affect customers and predict customer churn,to facilitate customer managers to take steps to prevent customer churn,to achieve the purpose of retaining customers.The architecture of the system is the traditional IT architecture,which has a poor expansibility,overdrafts the ability of the database,has a huge performance ceiling and a poor user experience.This thesis focuses on the use of the mainstream distributed computing engine Apache Spark and related components within the Hadoop ecosystem to build a linearly scalable cloud computing platform.This platform integrates distributed computing,distributed storage,and load balancing to existing Unicom customers.Retaining system for cloud transformation,research work is divided into real-time business and non-real-time business in two parts,the following lists the main work:The real time business section is carried out from three parts of data collection,flow computing and data storage.The data collection part mainly completes the data connection between the existing system and the new cloud system,and real-time business data collection and high real-time requirement of data,only a second delay allowed,so the use of structured data replication of Oracle backup software from real-time incremental data files of existing systems,using Kafka cluster production increment message,then the new cloud system by Spark cluster real-time consumer Kafka news,real-time reflect incremental changes.Secondly,the flow calculation part mainly uses Spark's streaming computing technology to process data streams,including raw data filtering,data analysis and processing,effective data filtering,timing processing and warehousing operations.Aiming at the abnormal situation,such as data loss,error data and so on,there will be data recalculation process.Finally,the data storage section uses the HBase+Redis mode,and the Redis stores the temporary data for 48 hours.HBase stores persistent data for 48 hours.Non real time business are carried out from four parts of data collection,dynamic programming,process scheduling and process engine.Because the real-time data collection of non real time business is not very demanding for real-time data,so we use Spark-Sftp to automatically extract data source files and distribute them to HDFS(distributed file system)cluster of new cloud computing system.The dynamic programming part is to use Javassist bytecode technology to process external data sources and convert them into DataFrame structure data that SparkSQL can process,so as to restore SQL's business logic.Flow arrangement part mainly through simple configuration process to complete the complex business logic in an existing system(Storedprocedure),adopts Xiorkflow open source JS framework with Tapestry to build a set of independent Web system,the realization of the interface design process of the real-time business logic is simple,configurable non complex.The process execution engine mainly loads the executable process in real time through the way of resident process,then parses the task nodes in the process,arranges the execution sequence of tasks according to the way of directed acyclic graph,and executes specific tasks in the thread pool of Spark.Finally,using the Spark based cloud system to verify the feasibility and advantages of the related research,it also shows the effectiveness and practicability of this work.
Keywords/Search Tags:Spark, Kafka, Big data processing
PDF Full Text Request
Related items