Font Size: a A A

Research On Non-parametric Clustering Algorithm Based On Category Utility And Its Improvement

Posted on:2018-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Y XuFull Text:PDF
GTID:2348330536977911Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since most of the existing clustering methods do not take the hierarchy of categories and the perspective of users into consideration,which results in an impalpable clustering result for users.In addition,these methods require users to input some sensitive parameters,which makes the quality of clustering difficult to control.Therefore,aiming at the two problems,the paper introduces the concept of basic-level category from cognitive psychology,and transforms the problem of clustering into the problem of finding basic-level category.According to the characteristics of basic-level category,the paper combines category utility of cognitive psychology with hierarchical clustering to find basic-level categories from text,and reduces the influence of noisy features as well as outliers by using chi,bdc and dividing scattered categories,then proposes a double text clustering algorithm based on category utility.The algorithm is a non-parametric text clustering algorithm,it can cluster text from the perspective of users,and find basic-level categories from text automatically.At the same time,the paper visualizes the relation between different basic-level categories,which makes it easy for users to analyze basic-level categories,and also provides decision support for dividing scattered categories.Since category utility is greatly influenced by noisy features and can not be used to find basic-level categories from continuous data.Therefore,the paper improves the definition formula of category utility,and proposes a new entropy-based category utility function(ECU),and then uses ECU to find basic-level categories from data,based on which proposes a clustering algorithm based on entropy-based category utility.The algorithm can be applied to both text data and continuous data,in addition,compared with category utility,ECU is less dependent on features and more adaptable.In order to verify the validity and superiority of our algorithms,experiments are constructed on two text data sets and six continuous data sets,the experimental results illustrate that our algorithms can get more natural clustering results than other algorithms.
Keywords/Search Tags:basic-level category, category utility, non-parametric, visualization, ECU
PDF Full Text Request
Related items