The Effect Of Labeled Sample Scale On Semi-supervised Text Clustering

Posted on:2015-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:R G Mo

Full Text:PDF

GTID:2268330428984284

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

During the past few years, semi-supervised learning has captured a great deal of attentions.In this research field, the labeled sample scale could affect the clustering result significantly.However, it is always a suspended problem that how many labeled samples are perfect. In thispaper, we try to reveal it and find the solution in text clustering. Based on two state-of-artclustering algorithms, namely k-means and Affinity Propagation (AP), we implement fivesemi-supervised clustering algorithms (Seeded kmeans (SK-means), constrained k-means(CK-means), loose seeds Affinity Propagation (LSAP), compact seeds Affinity Propagation(CSAP), and Tri-Set seeds Affinity Propagation (SAP)) to trail the effect of labeled samplescale.We apply the five algorithms to two benchmark data sets in text mining: Reuters-21578and NSF Research Award Abstracts1990-2003. Numerical results show that the increasingnumber of labeled samples may not always help the clustering algorithms to get a bettersolution. When the labeled sample scale is beyond the check point of35%for k-means basedalgorithm or25%for AP based algorithm, the learning ability of these algorithms will bestuck in a rut or will grow slowly. The experimental results can provide help forsemi-supervised clustering application. Researchers can select different algorithms accordingto different purposes.

Keywords/Search Tags:

Semi-supervised Clustering, Labeled Sample, Text Clustering

PDF Full Text Request

Related items

1	Semi Supervised Clustering Algorithm And Its Application And Research
2	Research On Semi-supervised Clustering Algorithm With The Priori Knowledge
3	A Novel Labels And Similarity Reconstruction Based On K-means Algorithm Application On Text Clustering
4	Research On Text Clustering Based On Semi-supervised Learning
5	Research On Semi-supervised Clustering And Classification Algorithm
6	Research On Key Technology Of Clustering Analysis Optimization
7	Semi-supervised Learning On Text Data
8	Distributed Clustering And Evolutionary Clustering Algorithm Based On Semi-supervised Learning
9	Research On Key Problems In Text Mining
10	Research On Chinese Short Text Classification Based On Semi-Supervised Clustering