Font Size: a A A

Customer Transaction Data Clustering Analysis And Parallelism Based On Shared Nearest Neighbor

Posted on:2022-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518306521496804Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis of customer transaction data can obtain better customer segmentation effect,which is helpful for enterprises to understand consumers more accurately and formulate precise marketing strategies.Pur Tree Clust is a new clustering algorithm for customer transaction data.It defines a new measurement method,purtree distance,which can analyze and process transaction data with hierarchical tree structure.However,it only allocates the purchase tree to the cluster of the nearest cluster center without considering the influence of neighbor points,which is prone to misallocate the purchased tree.A clustering algorithm of customer transaction data is proposed,that use the shared nearest neighbors information between purchase trees.The new algorithm makes full use of the shared nearest neighbors to conduct a more in-depth study on clustering analysis of customer transaction data and parallelization.It can find more compact and clear clusters,avoid the wrong allocation of the purchase trees,and improve the effect of customer segmentation.The main research works are as follows:(1)Using the shared nearest neighbor information between purchase trees,a clustering algorithm for customer transaction data is proposed.Firstly,purtree distance is used to calculate the similarity between shared nearest neighbor,the local density and separation distance of each customer purchase tree.In the process of cluster allocation,the shared nearest neighbor information is fully utilized.The subordinate purchase tree of the cluster is allocated at first,and then the possible subordinate purchase tree of the cluster is allocated to complete the cluster allocation,which can find more compact and clear clusters,avoid the wrong allocation of the purchase tree,and improve the customer segmentation effect.Finally,six real customer transaction data sets are used to verify the effectiveness of the algorithm.(2)In the Spark environment,a parallel clustering algorithm for customer transaction data is proposed.Firstly,the parallel algorithm splits up the data averagely to ensure that the amount of data on each slave node is basically the same;then on each slave node,the Pur Tree distance,local density and separation distance of the customer purchase tree are calculated separately,and the final clustering result is obtained.Finally,using real customer transaction data sets,experiments verify the scalability of the algorithm.
Keywords/Search Tags:Clustering, Transaction data, Customer segmentation, Purchase tree, Shared nearest neighbor
PDF Full Text Request
Related items