Font Size: a A A

Basic Level Categories Detecting In Hierarchical Clustering

Posted on:2020-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Z LiFull Text:PDF
GTID:2428330590961151Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is an important research direction of data mining.Since most clustering algorithms do not take user's preference and the relationship between major and discrete categories into account,cluster results are often not easily understood.Meanwhile,it is always difficult for us to determine such parameters,such as the numbers of categories,for lacking awareness of category distribution.So it is difficult to guarantee the quality of clustering.Since basic level category(BLC),referring to user's preference,in cognitive psychology helps classify data,this paper introduces it and try to solve clustering problems by detecting BLC.Category utility is a method aiming to find basic level categories.The essence of it is the trade-off between intra-class similarity and inter-class dissimilarity.Experimental results show there exist two issues in the mining process.First,while clustering,feature distribution of categories is dynamically changing,so is the weight of feature.Neglect of such variation may cause loss of valuable features,and has a bad effect on clustering result.Second,most of clustering algorithms cannot avoid discrete category and cannot separate discrete category from BLC,which make clustering result cannot be applied directly.To address the first issue,this paper proposes a BLC mining algorithm based on CUnew,incorporating inter-class distribution of features and dynamically assigns the weight of features according to distinguishing ability.This algorithms effectively mines category-distinguished features,alleviates influence of noisy features and bridges the gap of classification results and real BLC.To address the second issue,this paper proposes an BLC mining algorithm based on merging discrete categories.This algorithm merges categories of highest odd to be discrete categories by analyzing the relationship of BLC and discrete categories.According to the change of global impact calculating by normalized mutual information(NMI)after merging,the categories are divided into original BLC and discrete categories.Finally,discrete data is merged through a classification method.This algorithm has weak dependence on data distribution and features,do not need to define parameters in advance,and do not need to manually find and merge discrete classes.Thus,the number of categories is defined reasonably and a more realistic and valuable BLC is obtained.In order to evaluate the performance of the proposed algorithm,we do experiments on data sets of different quality.The results show that this algorithm can find better BLC no matter the quality of the data set.
Keywords/Search Tags:basic level category, category utility, discrete category, NMI, feature weighting
PDF Full Text Request
Related items