Font Size: a A A

Research On Mid-Perpendicular Hyperplane Similarity Criterion Based On Pairwise Constraints With Applications

Posted on:2012-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2218330338996177Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Measuring the similarity between data objects is one of the primary tasks for distance-based tech-niques in data mining and machine learning, e.g., distance-based clustering and distance-based classi-fication. For a certain problem, using proper similarity measurement will make it better solved. More and more researches show that, getting a similarity measurement fitting to a certain problem with pair-wise constraints can significantly improve the algorithmic performances. Nowadays, researches on similarity measurement with pairwise constraints are mainly distance metric learning based on pairwise constraints, which uses pairwise constraints to learn a distance matrix for classification or clustering. We propose a new similarity measurement with pairwise constraints, especially with cannot-link con-straints. The main contributions of this thesis are summarized as follows:Firstly, based on the analysis of INN and SVM algorithms, the conception of mid-perpendicular hyperplane is extracted. From the mid-perpendicular hyperplane of cannot-link constraint, We propose the Mid-Perpendicular Hyperplane Similarity criterion, and also show that how to compute the similarity measurement with a toy problem.Secondly, the criterion is applied at clustering, and we propose the clustering algorithm MPHS, short for Mid-Perpendicular Hyperplane Similarity. We divide MPHS for linear datasets and non-linear datasets, and propose several sub-algorithms. On several UCI datasets and several imagery datasets, MPHS outperforms other semi-supervised clustering algorithms.Thirdly, ensemble learning is introduced to semi-supervised clustering and experimental results show that our ensemble algorithm is better than others.Finally, the criterion is applied at classification. When obtaining the similarity matrix by MPHS-PCP process, we use Inn and svm to classify respectively, and propose the algorithms mphs-1nn and mphs-svm. We also introduce ensemble learning to classification, and propose the algorithms mphs-1nn-bagging and mphs-svm-bagging. The experimental results on several UCI datasets show the effectiveness of our algorithms.
Keywords/Search Tags:similarity measurement, pairwise constraint, distance metric learning, mid-perpendicular hyperplane similarity, ensemble learning, semi-supervised
PDF Full Text Request
Related items