Font Size: a A A

Design And Realization Of Information Analysis System Of Telecom Operator Mobile Terminal

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:M YanFull Text:PDF
GTID:2348330545458537Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Research on the classification of telecom customers is one of the top strategies of telecom operators thus can take a different business strategy on different types of customers.According to the characteristics of the terminal segment customers,mining potential information has become one of the important work of telecom operators.Currently,there is a huge amount of terminal-related data daily.Only in one day,Hebei Unicom new network terminal data,up to hundreds of thousands of new data daily data more than 1GB.However,the traditional structure of operators is not enough to support the ever-increasing storage and accurate analysis of data.Therefore,the subject started with the terminal SIM card data and users'data held by operators,completed the designment of dimension table based on existing data.And then completed data integration and data warehouse building of the theme on terminal information analysis.More,the subject worked on the traditional mining algorithms to improve and implement them on the Spark platform.This project aims at improving the data storage mode of the existing terminal data of operators and introducing distributed clusters.By utilizing the characteristics of lateral expansion,the data warehouse can be constructed to support the long-term storage of terabytes of data.At the same time,the key points of this subject were based on the characteristics of unlabeled data of telecom terminals and the unsupervised learning methods were selected for mining.On one hand,the K-means optimization algorithm based on Spark was proposed,the pre-clustering algorithm canopy algorithm had been introduced to optimize the K-means algorithm initial center selection,and the optimized K-means algorithm using Spark RDD programming model was achieved;on the other hand,because of the K-means algorithm's non-convex-data-insensitive characteristics,this subject selected another clustering algorithm-DBSCAN algorithm.KD-tree was used to divide data set to improve the efficiency of distributed cluster approach.The improved algorithm was implemented on the Spark platform.The improved two algorithms were selected to test some data in the integrated data warehouse,and compared with the stand-alone algorithm and the unmodified algorithm respectively.Compared with the serial algorithm and the unmodified one,the improved K-means algorithm improves the accuracy rate by about 5%for telecom datasets,and improves the clustering efficiency for data sizes above 10 ^ 6.For the DBSCAN algorithm,the clustering efficiency is improved at the level above 10 ^ 6.The big data terminal information analysis system established by this method has higher efficiency and accuracy than the traditional analysis system and can help telecom operators to make better business strategies.
Keywords/Search Tags:telecom operator, data mining, clustering algorithm, spark
PDF Full Text Request
Related items