Font Size: a A A

Detection Of Outliers Based On Social Relations And Geographical Location

Posted on:2019-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X LuoFull Text:PDF
GTID:2428330548459151Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and computer science,our everyday behavior are stored in the way of data.As the growing of the data,the research on the hidden characteristics behind them has become a hot topic at present.The method which we can get a model through the study of data is called data mining.With the increasing heat of data mining,Outlier detection,as an important field,has also been paid attention to.The abnormal point is also called the outlier,which is obviously different from the majority of data in a set of data sets.Outlier is different from noise,we can not distinguish it from bad or good.The analysis of the outliers can bring a lot of application to our work and life.In recent years,a variety of methods have been used to analyze the outliers,which has achieved and applied to many fields.The development of anomaly detection technology has also brought more challenges to human beings.How to analyze the high dimensional large data in a short time has become the research direction of the vast majority of scholars.In this article,we propose a method for detecting outliers based on social relations and geographical location.Our method can quickly and accurately classify the user's behavior data and find unusual users that do not conform to most user behavior features.Our algorithm can be divided into three steps:Firstly,we cluster the geographic location to distinguish the user's trajectory pattern under different densities.Secondly,we extract the user's social relations and merge with the geographic location to get the corresponding eigenvalue attributes of the user.On the basis of the fact that the feature space is too large or the feature data is sparse,we use t-SNE to reduce the dimension of the data without changing the characteristics of the original data;Finally,we use clustering algorithm to cluster the data in the feature space,find abnormal class clusters by model comparison,and ensure the exception point as a result.By using China Unicom's call records as sample data,We compared and analyzed the advantages and disadvantages of detecting outliers simply based on social relations and integration of social relations and geographical location information as eigenvalues.We found that by adding information on the dimension of geospatial dimensions,we can distinctly distinguish social relationships between users of different ages.It has greatly facilitated our search for outliers.We also compared the user's daily call data and the weekly call data which proved that our method can obtain more accurate results by enriching the user's characteristic attributes in the case of large amount of data.Finally,we use the comparison of two common clustering models,KMEANS and DBSCAN,on our model to prove that on our data set,the effect of clustering using KMEANS is better than that of DBSCAN.All in all,as a result of our principle analysis and experimental results,we prove the feasibility of our method which can provide reference for the detection of outliers in the future.
Keywords/Search Tags:Social connection, Geographical position, Anomaly detection
PDF Full Text Request
Related items