Font Size: a A A

A Feature Weighted NIC Algorithm Based On ReliefF

Posted on:2015-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2298330431996183Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is an unsupervised machine learning method. In the data setdistribution is unknown, Analysts typically find an appropriate clustering algorithm,data sets will be divided into several categories, In order to reveal the truedistribution of these data. Clustering analysis is one of the methods of multivariatestatistical analysis, its basic principle is in the case of there is no a priori knowledge,according to the principle of feather flock together, analyze the distance and thediscrete degree between the vector, categorized by samples distance, The sample wasdivided into a similar category, not similar samples in different classes, so it is animportant branch of a statistical pattern recognition of unsupervision patternclassification. Using this method can quantitatively determine the affinity-disaffinityrelationship between the object, to achieve the reasonable classification analysis, etc.How can we master the most useful in a vast and complex data set is a majorproblem to be solved in information processing. The effective method of solving thisproblem is to cluster data set. The clustering quality is related to people to use theinformation efficiency. And feature weighting can improve the clustering effecteffectively.NIC algorithm is based on maximizing the mutual information between datapoints and cluster, it does not need to provide the data distribution, also does notneed to provide the parameters model of the in-cluster distribution. It uses kNNentropy estimator to calculate the objective function. However, this algorithmassumes that all features of the sample to be analyzed plays a uniform contribution inthe process of cluster analysis. In fact, due to each dimension feature of a data setfrom different sensors, and the dimensional differences and precision and thereliability are different. Therefore, the influence of each feature of clustering isdifferent. Therefore, this paper proposes a novel non-parametric feature weightingclustering algorithm approach based on ReliefF, which is named featured NIC, toconsider of different feature,the new algorithm using ReliefF feature weightingtechnique to weighted, namely the weight given value to each feature, and iterativeupdate the weights, and then according the weight to change features which makes good feature to gather the similar samples, discrete heterogeneous samples, thecluster analysis runs after feature weighting, combining with none-parameteraverage entropy method to cluster the data set, not only make the clustering effect issuperior to the traditional clustering algorithm, and analyze the influence of eachfeature of clustering. This algorithm used entropy to measure features in the processof clustering, and reflected the importance of each feature in the data set.To verify the rationality and validity of the feature weighted NIC algorithmbased On ReliefF, this paper designed two experiments. Respectively consideringthe effect of the traditional NIC clustering algorithm and other classical clusteringalgorithm. At the same time, we compare the results of improved algorithm that weproposed in this paper and the traditional NIC algorithm and other classicalclustering algorithm. Experimental results demonstrate that this method weresuperior to the traditional method of NIC algorithm and other classical clusteringalgorithm in Precision and recall and F-measure.
Keywords/Search Tags:unsupervised, cluster, nonparametric information theoretic clusteringalgorithm, accuracy, feature weighting
PDF Full Text Request
Related items