Font Size: a A A

Research And Application Of User Clustering Method Based On Mixed Type Data Analysis

Posted on:2021-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:H YinFull Text:PDF
GTID:2428330614959261Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The purpose of user clustering analysis is to analyze the features of core user groups,which can be applied to precision marketing,business decision-making,safety warning,etc.The data about user are generally mixed data,including numerical,categorical and multidimensional asymmetric features,called multi-valued discrete features,such as interests and hobbies.However,the information of multi-valued discrete features cannot be deeply mined when traditional clustering algorithms such as K-means are used for user clustering analysis.Aim at the above problem,a clustering method that combines association rule mining and multi-valued discrete features is constructed.The proposed algorithm improves the quality of user behavior clustering.Meanwhile,aiming at the problem that the correlation and importance of user behaviors are not considered in the current user clustering analysis,a user clustering algorithm based on the importance of user behaviors is proposed,which improves the accuracy of clustering.The main research work of this paper is as follows:1.A clustering method that combines association rule and multi-valued discrete features is constructed.The existing user clustering algorithm cannot effectively analyze the multi-valued discrete feature in user data.This leads to lower utilization of data and a decrease in the accuracy of user similarity calculation.Firstly,association rules are introduced into the calculation of the Jaccard distance to construct a method of similarity measurement between users.And then,the update method of cluster center is improved based on the thought of K-modes clustering algorithm.Experimental results on real data show that the ARMDKM algorithm outperformed the other comparison algorithms in purity,entropy,silhouette coefficient.2.A two-layer user clustering method based on the importance of behavior is constructed.Different users behave differently,and the importance of different behaviors to user analysis is different in the field of user behavior analysis.In order to consider the weight between user behavior features,an unsupervised feature selection method combining K-means++ with random forest is used.Then,feature importance assessment of user behavior to obtain behavior weight.Finally,the final clustering results are obtained by cluster analysis of user behavior data based on the thought of spectral clustering.Experimental results show that the proposed algorithm is better than other algorithms in several indexes.
Keywords/Search Tags:user clustering, association rules, feature selection, random forest, spectral clustering
PDF Full Text Request
Related items