Re Se Arch On Clustering Algorithm Based On Data Stream

Posted on:2018-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:S L Ren

Full Text:PDF

GTID:2348330518973592

Subject:Computer Science and Technology

Abstract/Summary:

As the development and wide appl ication of inform ati on t echnology,the new research obj ect emerges in many new research areas,these dat a are different,which compare with t raditional data,these dat a are substant ial,sequenti al,potentially unlimited and changes continuously wit h high speed,we call data stream.S o research on clustering al gorithm based on data s tream has important significa nce.In the real environment,due to the accurac y of the machine,the data caus ed by the human factors will have some infl uence on the final result,producing uncert ain dat a,but existing data stream clust ering algorithm s are most aimed at determined data.So how to deal with the uncertain data has become a resea rch hot spot.On the other hand,as the data s tream has character istics of arriving at any tim e,people pay more attention t o those dat a rece ntly reache d,so how tell the data s tream from the data from beginning t o now,to achi eve better effe ct for data stream clust ering.Rega rding the preceding introductions,main res earc h work of this paper are as follows:1.Firs tly i ntroduces the appea rance of t he data stream and the challenge for data mining,t hen presents the resea rch background and si gnification of data st ream mining,and the research status at home and abroad.Then summariz es the development prospect and application of data stream mining,and introduces t he resea rch direction and the sim ilarity calculation method of data s tream.2.Existing data stream clustering algori thms are most base d on det ermined data,thus cannot handle the data stream clustering trouble for uncertain data.In view of the classic clustering algorithms of uncertain dat a s tream have the disadvantages as follow: hard to get probability density function,high computational complex ity and poor practicability.So clustering algorithm based on interval number for uncer tain data stream i s proposed,named UIDStream,this algorithm us es interval number to repres ent the uncer tai n data of attribute,and define the di stance calculation method based on interval number,defines the sim ilarity between the uncertain data,proposing 2k-near neighbors micro cluster and opt imal 2k-nea rest neighborsm icro cluster these two conce pt,by combing with 2k-nea r neighbors micro cluster to finish clustering for uncerta in data stream.The experi ment s display that the proposed algorithm i mproves quality and speed of uncerta in data stream clust ering,rea lizes good clust ering result s.3.Bec ause the data st ream has characte ristics of arrive at any ti me,the traditional clustering algorithm cannot consider the weight of time,and in many real areas people pay more attention t o those data that have arrived in the last period of time.This paper presents a time-dependent data stream clustering al gorithm TF-Stream,cons idering that the impact of data arri ving earlier has s hown an attenuation trend on t he degree of data stream cl ustering,so add the ti me fading funct ion.On the basis of DBSCAN algorithm,the concepts of neighborhood density and relative density are given,to data stream clust ering.The experiments displ ay t hat the proposed algorithm i s effective and has good clus tering effect.

Keywords/Search Tags:

dat a stream, clust ering, uncertain data, time fading function, data mining

Related items

1	Research On The Algorithm For Mining Frequent Items From Data Streams
2	Research On Frequent Itemset Mining Algorithm Of Uncertain Data Stream
3	Research On Data Mining And Forecasting Methods Over Time Series Data With Complex Structure
4	Research On Uncertain Data Stream Database System
5	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
6	Clustering Analysis And Outlier Detection Algorithms On Uncertain Data
7	A Clustering Algorithm For Uncertain Date Stream:UStreamUKm
8	Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining
9	Research On Algorithms For Mining Frequent Itemsets In Uncertain Data Streams
10	Research On Frequent Pattern Mining Of Uncertain Data