Font Size: a A A

Research On Multi-way Information Bottleneck Method

Posted on:2019-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q YanFull Text:PDF
GTID:1368330572957888Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information bottleneck(IB)method has been applied to various domains due to its theoretical foundation and data analysis ability,such as information coding,machine learning,image processing and pattern recognition.However,with the advent of the big data era,data records usually appear in the form of multi-source heterogeneous information.It is urgent to devise novel IB method to deal with the complex multisource heterogeneous data effectively,which is also a general trend of the researches on IB method.Thus,it is of great theoretical significance and application value to remedy the limitations of current IB solutions on multi-way heterogeneous information.The works in this study will fill the research gap in literature and further open a new page for IB method.Aiming to remedy the limitations of current IB solutions on multi-source heterogeneous data,this thesis proposes multi-way information bottleneck method.Specifically,the relevant models and algorithms of multi-way IB are investigated for the following problems: multi-feature fusion,heterogeneous feature integration,multi-task collaboration and cross-media share and private information maximization.The main contributions of this thesis can be summarized as follows.(1)Aiming to integrate multiple feature representations,we propose a novel feature collaborative information bottleneck(FC-IB)model,which has the ability to discover more reasonable pattern structures based on multiple relevant variables.First,by maximally conserving the relevant information between data patterns and the corresponding relevant variables,the proposed model can integrate multiple information cues into the final clustering results.Second,FC-IB takes the information loss as the pattern structure extraction criterion,and adopts information-theoretic optimization to guarantee that the objective function converges to a local maximum.The experimental results demonstrate that the performance of FC-IB is superior to typical clustering methods and other feature collaborative clustering methods.(2)To leverage the multiple heterogeneous features of the source data,we propose a novel consensus information bottleneck(CIB)model.First,the proposed CIB utilizes multiple original features to characterize data information from different views,while exploiting the basic clusterings to relief the conflict of heterogeneous features.Then,CIB generally formulates the problem of joint multi-view and ensemble clustering as a function of mutual information maximization.Finally,to optimize the objective function of CIB,a novel “draw-and-merge” optimization method is proposed.Extensive experiments on several practical tasks show that CIB outperforms the state-of-the-art multi-view and ensemble clustering methods.(3)To cope with the problem of ignoring the relationship between multiple related data sources,we propose a novel multi-task information bottleneck(MTIB)model by sharing information across multiple tasks.First,MTIB generally formulates this problem as an information loss minimization function.Second,the shared information is quantified by the distributional correlation of clusters in different tasks,which is based on a high-level common vocabulary constructed through a novel agglomerative information maximization method.Finally,to solve the optimization problem,a rotational“draw-and-merge” solution is proposed to update the data partition.Extensive experiments on several realistic datasets show that MTIB can consistently and significantly beat other state-of-the-art single-task and multi-task clustering methods.(4)The most of existing approaches for cross media data heavily rely on the shared latent feature space to construct the relationships between multiple modalities,while ignoring the private information hidden in each modality.Aiming at this problem,we propose a novel share-private information maximization(SPIM)model for cross media data clustering.First,we present two shared information construction models: hybrid word model and clustering ensemble model,which ensure the statistical correlation between the low-level features of multiple modalities and the semantic correlation of the high-level clustering partitions,respectively.Second,SPIM model integrates the shared information of multiple modalities and the private information of individual modalities into a unified objective function.Finally,the optimization of SPIM method is performed by a sequential “draw-and-merge” procedure,which guarantees the function converge to a local maximum.The experimental results on six cross media datasets show that the proposed approach compares favorably to the existing state-of-the-art cross-media clustering methods.
Keywords/Search Tags:IB method, multi-source heterogeneous data, feature fusion, heterogeneous feature integration, multi-task collaboration, cross-media computation, information sharing
PDF Full Text Request
Related items