Research And Application Of Data Mining Based On Rough Set In Clustering Discrimination

Posted on:2019-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2518306047461554

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Data mining is an emerging technology developed in recent years.It digs useful intrinsic knowledge from a large amount of data and effectively helps decision-makers to find the implicit correlation between data.In recent years,cluster analysis has become an important research topic in data mining.As an important function of data mining,clustering analysis can be used as a tool to discover data distribution information alone or as a preprocessing step in other data mining algorithms.In fuzzy clustering analysis,an infinite number of variables can be theoretically analyzed,but it has been proved that the true solution in clustering analysis is not proportional to the number of variables.If choose inappropriate variables or a strong correlation between variables,with the increase of variables will lead to more prone to multiple collinearity phenomenon some of the variables,can play a role but increased to analyze and solve the problem of disorder,lead to unreliable solution clustering.Therefore,these variables must be filtered.In this paper,UCI international commonly used standard test data set of Seeds data set as the experimental data of the algorithm.Firstly,the method of rough set based on genetic algorithm is used to reduce the conditional attributes,remove redundant or highly correlated variables,and then combine the Pearson correlation coefficient to determine the variables that are ultimately used for clustering.Secondly,using the system clustering and K-means clustering method to obtain the convergent clustering result.In order to verify the stability of the clustering solution of the squared deviation sum Ward method,the classification result of the systematic clustering is selected as the initial center of the iterative clustering method.Finally,using the result of clustering as the condition of discriminant analysis,the linear discriminant function can be used to judge the class of new samples by distance discriminantThe content of this paper has a strong theoretical knowledge and practical application of the value,the removal of redundant variables in the optimal solution,and through the combination of system clustering and K-means clustering results obtained by the convergence of clustering,In the final use of discriminant function to predict the process,the clustering results obtained play a role as a bridge.From the final forecast results,this method is feasible,so this method can be carried out in other areas of promotion,thus solving the practical problems in life.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Genetic Algorithm, Rough Set

PDF Full Text Request

Related items

1	Clustering Analysis Of Multidimensional Data Based On Rough Set
2	Research On Clustering Algorithm Based On Genetic Algorithm And Rough Set Theory
3	Study Of Clustering And Outlier Detection Algorithm In Data Mining
4	The Research On Clustering Algorithm For Categorical Data Based-on Rough Set
5	Application And Research Of Large Database Mining Based On Rough Set And Genetic Algorithm
6	Research On Clustering Algorithm Based On Data Mining And Its Application
7	A Research On Spatial Data Mining
8	Based On Rough Set Data Mining Method
9	Research On Application Of Rough Set Theory In Data Mining
10	Research And Improvement On Clustering Analysis Algorithm In Data Mining