Outlier Mining Algorithm Based On Overlapping Association Clustering And Optimal K-frequency Measure

Posted on:2022-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2518306536496824

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Nowadays,the society has entered the data age,as an important branch of data mining,in recent years,outlier mining has been widely concerned and studied by many scholars.Outlier mining can help people get information with remarkable abnormal characteristics accurately and quickly in a large amount of data.It is a very effective data mining method.At present,experts and scholars at home and abroad have put forward a variety of outlier mining methods.Outlier mining technology has been successfully applied in intrusion detection,fraud detection,medical health,ecological protection and other fields.This paper studies the problems of clustering effect,density based outlier mining algorithm,the difficulty of setting the nearest neighbor parameter K and parameter sensitivity.The main research contents of this paper are as follows.Firstly,the density based clustering algorithm is analyzed.The phenomenon of boundary point mismatch in clustering results due to the poor matching between parameter setting and dataset distribution and the problem of information concealment and information inundation are existed in clustering results due to too many parameter settings.Based on DBSCAN algorithm,the concept of overlapping clustering is introduced.By clustering the data objects twice,the data objects are preprocessed.Combined with the distance based outlier mining algorithm,a new algorithm based on overlapping Association clustering is proposed.At the same time,the detailed algorithm description and algorithm flow chart are given,and the correctness and time complexity of the algorithm are analyzed.Secondly,the density based outlier mining algorithm is studied.In order to solve the problem of parameter K preset and sensitivity of the near neighbor parameter k of lof algorithm,the relevant definitions of the optimal K value,the optimal K set and the local outlier of the data object are given.After obtaining some quasi outliers,the frequency measurement is introduced,and the outliers of the data objects are proofread again by frequency scheduling The degree of outlier mining is proposed based on the optimal K-frequency measurement.At the same time,the algorithm is described in detail and the algorithm flow chart is given.The correctness and time complexity of the algorithm are analyzed.Finally,in UCI real data set and simulation data set,the data set is mined by using two outlier mining methods proposed in this paper.The comparison experiment with other outlier mining algorithms is carried out to verify the effectiveness of the two outlier mining algorithms.

Keywords/Search Tags:

data mining, outliers, clustering, optimal K value, frequency measurement

PDF Full Text Request

Related items

1	Research On The Outliers Detection Algorithm
2	Research On Outliers Mining Method To Web Content
3	K-distance-based Outliers And Clustering Algorithm
4	Research On Extended Knowledge Discovery In High-Dimension And Sparse Outliers Set
5	Research Of Outliers Mining Applied In Snort System Improvement
6	Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting
7	A Research On Outliers Mining Algorithm Based On Heat Metering Data
8	Mining Association Rules Among Outliers Based On Histogram And FP-growth
9	The Research Of Real-time Data Analysis Based On Data Mining
10	Study And Application Of Clustering Analysis In Data Mining