Font Size: a A A

User Clustering Research Based On Cellular Network Data

Posted on:2020-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z YuanFull Text:PDF
GTID:2428330572976410Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Along with the development of the mobile Internet,the amount of data in the cellular network is exploding,and the revenue of traditional services such as voices of telecom operators is shrinking,and the viscosity of users of their own products is also declining.How to use the massive user data in the cellular network to mine valuable user behavior patterns and establish appropriate user behavior feature models to optimize the user's product experience and improve the marketing accuracy has become a hot research topic in recent years.As an unsupervised learning algorithm,clustering technology is very suitable for exploring hidden patterns in data.Based on the massive cellular network service data collected from operators,this thesis conducts user clustering research from time dimension and space dimension respectively.The main work of the thesis includes:First,the extraction of heavy users.This thesis finds that the user's traffic usage is very unevenly distributed by drawing the Lorentz curve of user traffic.About 21.2%of the heavy users consume 81.85%of the traffic;then this thesis extracts some heavy users who have important influence on the cellular network.The research compares with the average user in terms of traffic usage,active duration,number of services,and mobility.The results show that heavy users exceed the average user in terms of traffic usage,active duration,and number of services.There is no significant difference between sex and ordinary users.Second,clustering of users in time dimension to explore the traffic usage pattern of heavy users.This thesis studies the user clustering of time dimension and the feature vector that can represent the user's preference for different time periods.The eigenvector is created by dividing the flow rate into five time periods according to the law of life 24 hours a day,calculating the flow ratio of each time period to the whole day,and dividing by the number of hours of the time period to form a feature vector of the user.Next,the K-means algorithm is selected for clustering,and the optimal cluster number K is 4 according to the three evaluation indicators.The four types of users prefer the four periods of bedtime,leisure,work,commuting/dining.Use more traffic to provide reference for operators to optimize their networks and accurately market them.The results show that the specific clustering process and the scheme of creating feature vectors proposed in this thesis can effectively mine the traffic usage patterns of different users.Third,clustering of user groups in spatial dimensions to discover groups of users with potentially high value.This thesis studies the clustering of user groups in spatial dimension and the feature vectors that can represent the value of user groups.User groups are divided into groups:the same users(the base stations that use the most traffic during a period of time)are divided into the same group.The user group feature vector is created by discretizing three consecutive attributes of each user in the group—data traffic,mobility(number of visited base stations),and number of service types.Each of the discretization results is used.The user projects into a subspace in the three-dimensional space,and the proportion of users falling in each subspace in the group is used as the feature vector of the group.After evaluation,it was found that the best results could not be obtained.After excluding the reasons for improper clustering algorithm selection and parameter setting,the feature vector is re-created,and only two dimensions of data traffic and mobility(the number of visited base stations)are used,and the hotspot base station is extracted for research.The clustering result evaluation finds the most.The cluster number K of clusters is 3 or 4.When analyzing the clustering results,a group of users with potential high-value users is found.The results show that the specific clustering process and the scheme of creating feature vectors proposed in this thesis can effectively mine the user groups with potential high-value users.
Keywords/Search Tags:Cellular Network, Service Data, User Clustering, Feature Engineering
PDF Full Text Request
Related items