Research On Non-parametric Clustering Algorithm Based On Category Utility And Its Improvement

Posted on:2018-06-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Xu

Full Text:PDF

GTID:2348330536977911

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Since most of the existing clustering methods do not take the hierarchy of categories and the perspective of users into consideration,which results in an impalpable clustering result for users.In addition,these methods require users to input some sensitive parameters,which makes the quality of clustering difficult to control.Therefore,aiming at the two problems,the paper introduces the concept of basic-level category from cognitive psychology,and transforms the problem of clustering into the problem of finding basic-level category.According to the characteristics of basic-level category,the paper combines category utility of cognitive psychology with hierarchical clustering to find basic-level categories from text,and reduces the influence of noisy features as well as outliers by using chi,bdc and dividing scattered categories,then proposes a double text clustering algorithm based on category utility.The algorithm is a non-parametric text clustering algorithm,it can cluster text from the perspective of users,and find basic-level categories from text automatically.At the same time,the paper visualizes the relation between different basic-level categories,which makes it easy for users to analyze basic-level categories,and also provides decision support for dividing scattered categories.Since category utility is greatly influenced by noisy features and can not be used to find basic-level categories from continuous data.Therefore,the paper improves the definition formula of category utility,and proposes a new entropy-based category utility function(ECU),and then uses ECU to find basic-level categories from data,based on which proposes a clustering algorithm based on entropy-based category utility.The algorithm can be applied to both text data and continuous data,in addition,compared with category utility,ECU is less dependent on features and more adaptable.In order to verify the validity and superiority of our algorithms,experiments are constructed on two text data sets and six continuous data sets,the experimental results illustrate that our algorithms can get more natural clustering results than other algorithms.

Keywords/Search Tags:

basic-level category, category utility, non-parametric, visualization, ECU

PDF Full Text Request

Related items

1	Basic Level Categories Detecting In Hierarchical Clustering
2	Research On Mapping Mechnism Of Learning Expression
3	Category-level Object Pose Estimation
4	Category-level 3D Object Tracking Model Based On Inter Frame Correspondence
5	Research On The Optimization Of Commodity Category Management Of Company A
6	Nearest Neighborhood-Based Rare Category Mining
7	Open-domain Named Entity Recognition And Hierarchical Category Acquisition
8	Automatic Recognition Research On Syntactic Category Of Common Words
9	"sikuquanshuzongmu" Zibu Classification Study
10	Characteristics Of Category Feature Characters