Clustering Data Mining Applications In Department Store And K-means Clustering Algorithm Improvement

Posted on:2006-11-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Luo

Full Text:PDF

GTID:2168360155972249

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

Clustering is an important area of application for a variety of fields includingdata mining and is an important method of data partition or grouping. Clustering hasbeen used in various ways in commerce, market analysis, biology, Web classificationand so on. So far, clustering algorithms includes partitioning, hierarchical,density-based, grid-based , model-based algorithm and fuzzy clustering.Microsoft Analysis Services software is an applied data mining implementwhich based on a clustering analysis of density algorithm. It is used in this paper forOLTP data of Chongqing Liangbai Co. to create the warehouse analyzing sell data inthe department store. Based on the warehouse, we firstly build a clustering model ofcustomers'characters and analyze these customers which shopping in this store. Then,we do co-clustering on customers'characters and the sorts of commodities which theybought, thus we can get the information of the relationship between customers'characters and the sorts of commodities. This paper show the whole process fromconfirming the analyzed aim, building models in warehouse, data transfer, confirmingmining model, dealing with mining to analysis of mining results. At present, moststore has VIP card for customers. So, some information of customers can be collected,the information are useful for analyzing customers, the application example in thispaper can be a useful reference for this kind of application.K-means is a partitioning algorithm that constructs a partition of a database of nobjects into a set of K clusters where K is an input parameter. Clustering use aniterative procedure, if this algorithm converges to one of numerous local minima, itterminates and outputs result. So outputs are especially sensitive to initial startingcondition for random selections about K initial starting points, which will lead to badsolutions while chose initial points inappropriately. Agglomeration-clusteringalgorithm is a from-bottom-to-top method, which treats every object as a single group,and combines the similar objects or groups. It will end until all the objects combinedinto a group or a terminate condition. The defect of this algorithm is that it cost a lotof time on calculation.In this paper, I give an improved K-means algorithm which can get proper initialstarting points. The algorithm is sampling in database equally and gets a sub-samplewhich can represent the character of original database, and get K clustering centerswith Agglomeration-clustering algorithm. At this time, we chose the nearest K pointsto the K clustering centers as initial starting points of the K-means algorithm for thewhole database. The method will cost not too much time because it run in asub-sample. While calculating with K-means algorithm, it is much more efficient thanoriginal K-means algorithm for the initial clustering centers are so closing to the trueones. We can demonstrate that the improved K-means algorithm can get bettersolution and is little sensitive to initial starting points with real example.

Keywords/Search Tags:

data mining, clustering, K-means algorithm

PDF Full Text Request

Related items

1	The Research Of Clustering Data Mining Based On Swarm Intelligence Algorithm
2	Clustering Data Mining Applications In Department Store And K-means Clustering Algorithm Improvement
3	Research And Improvement Of K - Means Clustering Algorithm
4	The Improvement On The Fuzzy C-means Algorithm
5	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis
6	Research Of Improving For K-means Clustering Algorithm
7	The Research Of K-means Clustering Algorithm In Data Mining
8	Study On K-means Clustering Algorithm Based On Summarized Information Of RTVU Students
9	Based On The Selection Of The Initial Point Of K-means Clustering Algorithm And Its Application
10	Ant Clustering Algorithm With K-harmonic Means Clustering