Font Size: a A A

Improvement Of Non-IID K-means Algorithm And Its Application In Player Data Analysis

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:P C PanFull Text:PDF
GTID:2427330602497177Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The advent of the Internet era will inevitably produce a large amount of data,and data mining is to use some non-trivial methods to discover valuable information from these data.Cluster analysis,as one of these non-trivial methods,is an important research area in data mining.Among the related algorithms of cluster analysis,the K-means algorithm is one of the classic algorithms.It is simple and efficient,but it also has some defects.For example,the random selection of the clustering center points will easily lead to unstable clustering results,and it is also affected by some outliers,so that the clustering results are often only locally optimal.In addition,the traditional K-means algorithm and its current improved algorithms are all carried out within independent and identically distribution context.However,real-world data is often non-independent and identically distribution(i.e.Non-IID),that is,there are more or less coupling or interaction between attribute values,attributes,and objects.If such relations are ignored,important information in the data may be lost,which will affect the results of cluster analysis.Therefore,this paper improves the K-means algorithm under the concept of Non-IID,and the optimized algorithm is applied to the NBA player data.The main work of this article is as follows:In theoretical research,it is mainly divided into two parts.In the first part,in view of the defects of K-means algorithm on the clustering instability caused by the random selection of initial clustering centers and the influence of outliers,an optimized K-means algorithm within independent and identically distribution context is proposed(i.e.IIDOPK),which is optimized by combining the dual-domain idea and maximum distances product method.Experimental results show that higher accuracy,better clustering effect and fewer iterations can be obtained in the UCI data sets.In the second part,in view of the defects of independent and identically distribution idea,the optimized K-means algorithm is combined with the concept of Non-IID,and an optimized K-means algorithm within Non-IID context(i.e.Non IID-OPK)is proposed.First,the modified Pearson correlation coefficient formula is used to calculate the coupling relations between the attributes themselves and the different attributes.Then,the obtained coupling relation coefficients are expressed in matrix form and mapped to objects in the data set through Taylor-like expansion form,so as to obtain the global coupling representation of the original data set.Finally,the new coupling representation is brought into Non IID-OPK algorithm to cluster analysis.Experimental results show that higher accuracy can be obtained in the same UCI data set.In specific applications,with the development of data mining analysis technology,people can also mine some potentially valuable information from NBA player data.Some existing research methods are based on independent and identically distribution.These methods ignore the coupling relations between attributes,attribute values,and objects.Therefore,this paper brings the verified Non IID-OPK algorithm into the NBA player data for cluster analysis,classifies players according to their different positions,and considers some additional factors for team managers to provide decision-making suggestions when making decisions on trades and signings,so that the team can obtain more benefits with less funds.
Keywords/Search Tags:Non-IID, K-means, coupling relation, initial clustering center, player data
PDF Full Text Request
Related items