Font Size: a A A

Research On Multi-classification Model Of Abnormal Users Based On Telecommunication Data

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C LuFull Text:PDF
GTID:2518306764993409Subject:Information and Post Economy
Abstract/Summary:PDF Full Text Request
Along with the development of information and communication technology,the number of risky users represented by fraudulent calls and sales calls is increasing.Such risky users not only damage the rights and interests of operators,but also greatly harm the property security and legitimate rights of the people.At the same time,China's major telecom operators have accumulated a large amount of user consumption,Internet behavior,location and business data in their business operations.Operators generate hundreds of millions of bill data every day,producing a massive data set.How to use this data to accurately and quickly identify risky users has become an urgent problem for operators at present.Based on the abnormal user detection theory,this paper selects the real telecom user data for processing and analysis,constructs a multi classification model of abnormal users based on telecom bill data,and designs and implements the prototype system and scenario business application.The paper also verifies that the accuracy and efficiency of this method are better than the traditional algorithms.The main results of the paper are as follows.Pre-processing of multidimensional actual business data.Based on the massive actual business data of telecom operators,the massive data set was preprocessed based on the Map Reduce framework of big data technology,and the distributed parallel data processing technology was adopted to extract 500,000 normal users and500,000 suspected abnormal users from the daily 170 million call list data through rule screening,and more than 20 basic The features are combined with the reported real abnormal users to form the data set for model training.Second,an imbalance data processing algorithm based on the fusion of nearest neighbor rule undersampling and adaptive oversampling(ADASYN-Temoklink)is proposed.The problems arising from unbalanced datasets and their processing are analyzed and sorted out,and the sampling method of random synthetic minority class oversampling mixed with nearest neighbor rule undersampling is proposed from the perspective of data samples,which effectively samples the sparse domain of minority class samples and makes the heterogeneous class boundaries clear.After experimental validation,the accuracy of the fusion algorithm reaches 80.3% and the recall rate reaches 78.3%,which are optimized compared with the traditional methods of unbalanced data set processing such as SMOTE,ENN.Third,a multi-classification model based on the fusion of decision tree and least squares support vector machine(LSSVM)is studied and proposed.The LSSVM classifier is treated as a node in the decision tree,and a specific subset of features for a certain category is input in each node to gradually decompose the complex multi-classification problem into a simple binary classification problem,so as to improve the accuracy of the classification algorithm,and construct a DLSVM algorithm that can directly perform multi-classification according to the advantages of the decision tree algorithm,so as to solve the multi-classification problem more conveniently and efficiently.The DLSVM algorithm can be used to solve the multi-classification problem more easily and efficiently.The fusion model has been experimentally validated to achieve 87% classification accuracy compared to the traditional three multi-classification models.It is also improved compared with traditional classification algorithms such as SVM,1v1 SVM,and 1v RSVM classifiers,which can meet the demand for multi-classification of telecom abnormal users.In summary,the implementation of the multi-classification algorithm for abnormal users based on the telecom bill can make the operator's policy analysis of abnormal users more perfect,effectively avoid misjudgment of normal users,thus making the threshold selection of the policy more reasonable,thus better improving the abnormal user detection system and effectively enhancing the behavior recognition of abnormal users.The subsequent verification of the abnormal users with different degrees identified by the model needs to be further studied.
Keywords/Search Tags:telecom bill big data, abnormal users, unbalanced data set, multi-classification algorithm, machine learning
PDF Full Text Request
Related items