Font Size: a A A

Research And Application Of Data Mining Based On Rough Set In Clustering Discrimination

Posted on:2019-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518306047461554Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Data mining is an emerging technology developed in recent years.It digs useful intrinsic knowledge from a large amount of data and effectively helps decision-makers to find the implicit correlation between data.In recent years,cluster analysis has become an important research topic in data mining.As an important function of data mining,clustering analysis can be used as a tool to discover data distribution information alone or as a preprocessing step in other data mining algorithms.In fuzzy clustering analysis,an infinite number of variables can be theoretically analyzed,but it has been proved that the true solution in clustering analysis is not proportional to the number of variables.If choose inappropriate variables or a strong correlation between variables,with the increase of variables will lead to more prone to multiple collinearity phenomenon some of the variables,can play a role but increased to analyze and solve the problem of disorder,lead to unreliable solution clustering.Therefore,these variables must be filtered.In this paper,UCI international commonly used standard test data set of Seeds data set as the experimental data of the algorithm.Firstly,the method of rough set based on genetic algorithm is used to reduce the conditional attributes,remove redundant or highly correlated variables,and then combine the Pearson correlation coefficient to determine the variables that are ultimately used for clustering.Secondly,using the system clustering and K-means clustering method to obtain the convergent clustering result.In order to verify the stability of the clustering solution of the squared deviation sum Ward method,the classification result of the systematic clustering is selected as the initial center of the iterative clustering method.Finally,using the result of clustering as the condition of discriminant analysis,the linear discriminant function can be used to judge the class of new samples by distance discriminantThe content of this paper has a strong theoretical knowledge and practical application of the value,the removal of redundant variables in the optimal solution,and through the combination of system clustering and K-means clustering results obtained by the convergence of clustering,In the final use of discriminant function to predict the process,the clustering results obtained play a role as a bridge.From the final forecast results,this method is feasible,so this method can be carried out in other areas of promotion,thus solving the practical problems in life.
Keywords/Search Tags:Data Mining, Clustering Analysis, Genetic Algorithm, Rough Set
PDF Full Text Request
Related items