Font Size: a A A

Re Se Arch On Clustering Algorithm Based On Data Stream

Posted on:2018-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:S L RenFull Text:PDF
GTID:2348330518973592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development and wide appl ication of inform ati on t echnology,the new research obj ect emerges in many new research areas,these dat a are different,which compare with t raditional data,these dat a are substant ial,sequenti al,potentially unlimited and changes continuously wit h high speed,we call data stream.S o research on clustering al gorithm based on data s tream has important significa nce.In the real environment,due to the accurac y of the machine,the data caus ed by the human factors will have some infl uence on the final result,producing uncert ain dat a,but existing data stream clust ering algorithm s are most aimed at determined data.So how to deal with the uncertain data has become a resea rch hot spot.On the other hand,as the data s tream has character istics of arriving at any tim e,people pay more attention t o those dat a rece ntly reache d,so how tell the data s tream from the data from beginning t o now,to achi eve better effe ct for data stream clust ering.Rega rding the preceding introductions,main res earc h work of this paper are as follows:1.Firs tly i ntroduces the appea rance of t he data stream and the challenge for data mining,t hen presents the resea rch background and si gnification of data st ream mining,and the research status at home and abroad.Then summariz es the development prospect and application of data stream mining,and introduces t he resea rch direction and the sim ilarity calculation method of data s tream.2.Existing data stream clustering algori thms are most base d on det ermined data,thus cannot handle the data stream clustering trouble for uncertain data.In view of the classic clustering algorithms of uncertain dat a s tream have the disadvantages as follow: hard to get probability density function,high computational complex ity and poor practicability.So clustering algorithm based on interval number for uncer tain data stream i s proposed,named UIDStream,this algorithm us es interval number to repres ent the uncer tai n data of attribute,and define the di stance calculation method based on interval number,defines the sim ilarity between the uncertain data,proposing 2k-near neighbors micro cluster and opt imal 2k-nea rest neighborsm icro cluster these two conce pt,by combing with 2k-nea r neighbors micro cluster to finish clustering for uncerta in data stream.The experi ment s display that the proposed algorithm i mproves quality and speed of uncerta in data stream clust ering,rea lizes good clust ering result s.3.Bec ause the data st ream has characte ristics of arrive at any ti me,the traditional clustering algorithm cannot consider the weight of time,and in many real areas people pay more attention t o those data that have arrived in the last period of time.This paper presents a time-dependent data stream clustering al gorithm TF-Stream,cons idering that the impact of data arri ving earlier has s hown an attenuation trend on t he degree of data stream cl ustering,so add the ti me fading funct ion.On the basis of DBSCAN algorithm,the concepts of neighborhood density and relative density are given,to data stream clust ering.The experiments displ ay t hat the proposed algorithm i s effective and has good clus tering effect.
Keywords/Search Tags:dat a stream, clust ering, uncertain data, time fading function, data mining
PDF Full Text Request
Related items