Font Size: a A A

Research On Semi-supervised Clustering Ensemble Approach And Its Application

Posted on:2017-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:S T WeiFull Text:PDF
GTID:2348330488475452Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the rapid development of information technology and network technique, it is expanded enormously that people's abilities and channels to capture information. While enriching people's information, huge amounts of data brings enormous challenges to organize, search and analyze. How to extract useful information from large databases quickly and accurately, that has been a very valuable research topic.On one hand, paper studies clustering which play an important role on data analysis. Clustering searches the underlying structures or rules between samples, and predicts partition based on principle that maximizes the similarity between samples from same cluster, minimizes the similarity between samples from different clusters. Because data collection method has been diversified and storage technology develops rapidly, so it becomes much easier to collect data. But most of obtained samples have no label, and people can get a small amount of labeled data in some actual applications easily. In addition, data objects become so abstract and complex that more and more innovation theories and methods have been proposed to meet the demand of reality. Lots of clustering achievements have been achieved, especially in the aspects of semi-supervised clustering and clustering ensemble. Paper researches on semi-supervised clustering ensemble mainly. On the other hand, image is an important product of multimedia information era. Content-based image retrieval can store and manage image resources effectively, but it is constrained by "semantic gap". Therefore, image annotation establishing semantics mapping becomes a major topic in multimedia field. The performances of existing image annotation methods depend on image segmentation or clustering technology in a large extent. But image segmentation is difficult to break through, and unsupervised clustering results are not good enough, so this paper tries to dig into image semantic content from semi-supervised clustering and research on image annotation.We analyze research background and status about clustering and semi-supervised clustering. And then we study key technology of semi-supervised clustering ensemble. What's more, we provide the semi-supervised clustering ensemble approach and its application on automatic image annotation, and explain theoretical basis and modeling process. We make a contrastive analysis with experimental results. Finally, we make a summary and prospect on our research work. The main achievements are shown as follow.At present, two typical approaches of semi-supervised clustering called constraint-based and metric-based are researched a lot. Although the two kinds of methods have their own singular focus, they aren't separated completely or have a symbiotic relationship. It is due to that hybrid approaches which aim to combine both advantages can obtain more satisfactory results. Reviewing previous literatures, we found that most of hybrid approaches add two factors in the same objective function, but few synthesize by ensemble mechanism. Paper presents a semi-supervised clustering ensemble approach in conjunction with the both methods. Several base clustering results are obtained respectively employing constraint-based and metric-based methods. We use ensemble method to obtain the final target clustering.Pixels'distance metric used in previous literatures is merely based on intrinsic properties. As we all known, one pixel and its surrounding pixels are tightly linked, so it is necessary and reasonable to incorporate spatial characteristic in objective function. However, existing methods always select various means or statistical operators in a designated area around pixel as its spatial information, whose results still exist more or less deviation with actual features. To mitigate deviation, paper concerns intrinsic feature of pixel as well as spatial characteristic of its surroundings for metric measure simultaneously, which not confined to single perspective and reflects pixels' similarity accurately. Accurate metric helps to improve clustering performance.Generally, image content has the characteristics of complexity, fuzziness, abstractness and polysemy, so its persuasion is far from enough if image is described just rely on image low-level feature. It needs to map low-level features to high-level semantic concepts. We employ keyword category method to obtain labeled regions which is used as supervised information, combine the proposed semi-supervised clustering ensemble approach with cross media relevance model, and employ resample and majority voting to annotate image in pursuit of improving performance.
Keywords/Search Tags:semi-supervised clustering ensemble, pairwise constraints, metric measure, keyword category, majority voting
PDF Full Text Request
Related items