Font Size: a A A

Clustering Data Mining Applications In Department Store And K-means Clustering Algorithm Improvement

Posted on:2006-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuoFull Text:PDF
GTID:2168360155972249Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Clustering is an important area of application for a variety of fields includingdata mining and is an important method of data partition or grouping. Clustering hasbeen used in various ways in commerce, market analysis, biology, Web classificationand so on. So far, clustering algorithms includes partitioning, hierarchical,density-based, grid-based , model-based algorithm and fuzzy clustering.Microsoft Analysis Services software is an applied data mining implementwhich based on a clustering analysis of density algorithm. It is used in this paper forOLTP data of Chongqing Liangbai Co. to create the warehouse analyzing sell data inthe department store. Based on the warehouse, we firstly build a clustering model ofcustomers'characters and analyze these customers which shopping in this store. Then,we do co-clustering on customers'characters and the sorts of commodities which theybought, thus we can get the information of the relationship between customers'characters and the sorts of commodities. This paper show the whole process fromconfirming the analyzed aim, building models in warehouse, data transfer, confirmingmining model, dealing with mining to analysis of mining results. At present, moststore has VIP card for customers. So, some information of customers can be collected,the information are useful for analyzing customers, the application example in thispaper can be a useful reference for this kind of application.K-means is a partitioning algorithm that constructs a partition of a database of nobjects into a set of K clusters where K is an input parameter. Clustering use aniterative procedure, if this algorithm converges to one of numerous local minima, itterminates and outputs result. So outputs are especially sensitive to initial startingcondition for random selections about K initial starting points, which will lead to badsolutions while chose initial points inappropriately. Agglomeration-clusteringalgorithm is a from-bottom-to-top method, which treats every object as a single group,and combines the similar objects or groups. It will end until all the objects combinedinto a group or a terminate condition. The defect of this algorithm is that it cost a lotof time on calculation.In this paper, I give an improved K-means algorithm which can get proper initialstarting points. The algorithm is sampling in database equally and gets a sub-samplewhich can represent the character of original database, and get K clustering centerswith Agglomeration-clustering algorithm. At this time, we chose the nearest K pointsto the K clustering centers as initial starting points of the K-means algorithm for thewhole database. The method will cost not too much time because it run in asub-sample. While calculating with K-means algorithm, it is much more efficient thanoriginal K-means algorithm for the initial clustering centers are so closing to the trueones. We can demonstrate that the improved K-means algorithm can get bettersolution and is little sensitive to initial starting points with real example.
Keywords/Search Tags:data mining, clustering, K-means algorithm
PDF Full Text Request
Related items