Font Size: a A A

Outlier Mining Algorithm Based On Overlapping Association Clustering And Optimal K-frequency Measure

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2518306536496824Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the society has entered the data age,as an important branch of data mining,in recent years,outlier mining has been widely concerned and studied by many scholars.Outlier mining can help people get information with remarkable abnormal characteristics accurately and quickly in a large amount of data.It is a very effective data mining method.At present,experts and scholars at home and abroad have put forward a variety of outlier mining methods.Outlier mining technology has been successfully applied in intrusion detection,fraud detection,medical health,ecological protection and other fields.This paper studies the problems of clustering effect,density based outlier mining algorithm,the difficulty of setting the nearest neighbor parameter K and parameter sensitivity.The main research contents of this paper are as follows.Firstly,the density based clustering algorithm is analyzed.The phenomenon of boundary point mismatch in clustering results due to the poor matching between parameter setting and dataset distribution and the problem of information concealment and information inundation are existed in clustering results due to too many parameter settings.Based on DBSCAN algorithm,the concept of overlapping clustering is introduced.By clustering the data objects twice,the data objects are preprocessed.Combined with the distance based outlier mining algorithm,a new algorithm based on overlapping Association clustering is proposed.At the same time,the detailed algorithm description and algorithm flow chart are given,and the correctness and time complexity of the algorithm are analyzed.Secondly,the density based outlier mining algorithm is studied.In order to solve the problem of parameter K preset and sensitivity of the near neighbor parameter k of lof algorithm,the relevant definitions of the optimal K value,the optimal K set and the local outlier of the data object are given.After obtaining some quasi outliers,the frequency measurement is introduced,and the outliers of the data objects are proofread again by frequency scheduling The degree of outlier mining is proposed based on the optimal K-frequency measurement.At the same time,the algorithm is described in detail and the algorithm flow chart is given.The correctness and time complexity of the algorithm are analyzed.Finally,in UCI real data set and simulation data set,the data set is mined by using two outlier mining methods proposed in this paper.The comparison experiment with other outlier mining algorithms is carried out to verify the effectiveness of the two outlier mining algorithms.
Keywords/Search Tags:data mining, outliers, clustering, optimal K value, frequency measurement
PDF Full Text Request
Related items