Font Size: a A A

Research Of Block Data Clustering Algorithms Based On The Bag Of Word Model

Posted on:2017-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:W T NiuFull Text:PDF
GTID:2348330512450938Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
In real applications,some of objects cannot be simply depicted by a single feature vector.For example,the object which describes the shopping behavior of a consumer is composed of several shopping records,and the number of records belongs to different consumers is usually different.This kind of objects depicted by multiple feature vectors raises new challenges to traditional clustering algorithms.In this thesis,we introduce the concept of block data object to define this kind of data.The main contents of this thesis are summarized as follow:(1)Firstly,we define a new expression of block data object based on the bag of word model and then solve the problem that block data set cannot be clustered by traditional clustering algorithms directly.We employ DBSCAN algorithm to cluster the data set which is domain values of every dimension on a given block data set and obtain the class distributions of virtual objects.Then we design a block data clustering algorithm(BWM-BDC algorithm)based on the new expression of block data object.Experimental results on both Musk data set and weather data have shown the effectiveness of this algorithm.(2)Secondly,we propose a method to improve the time efficiency of BWM-BDC algorithm by using F Leaders algorithm.We employ F Leaders algorithm to gain the partition and represent objects of the data set which is domain values of every dimension on a given block data.After we employ DBSCAN algorithm to cluster the set of represent objects,the other objects in dimension will be mark according the corresponding represent object's class label.This method improves the running time by reducing the number of objects.Experimental results on Musk data set have shown the improvement of running time by this method.The contribution of this thesis is proposing a new idea in block data clustering method,further enriching the cluster analysis of categorical,and exerting positive effects in the research field of block data sets.
Keywords/Search Tags:Block data, Bag of word model, DBSCAN algorithm, K-means algorithm, Leaders clustering method
PDF Full Text Request
Related items