Font Size: a A A

Research And Application Of Matrix Factorization Algorithms For Multi-source Heterogeneous Data

Posted on:2020-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:S D HuangFull Text:PDF
GTID:1368330623458272Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the efficiency and effectiveness of learning data structures hidden in data,matrix factorization methods have been widely investigated and achieve promising performance in machine learning.Real data are often collected from multiple channels or comprised of different representations(i.e.,views).For instance,images can be introduced by different visual descriptors.Different views usually capture different aspects of information,any of which suffices for mining knowledge.Furthermore,the encoded information of different views is consistent and complementary to each other,which is instrumental to produce better performance.Thus it is expected to exploit multiple views to generate more promising results rather than rely on a single view.Therefore,it is critical to develop a new learning paradigm,i.e.,multi-view learning,to efficiently analyze these heterogeneous features such that the accuracy and robustness of learning algorithms can be improved.In this dissertation,first we briefly review the basic form of the classical matrix factorization.Then we provide the solutions of following problems in an unsupervised way when dealing with multi-source heterogeneous data.1.Traditional matrix factorization algorithms are generally sensitive to noise and outliers.In particular,there are often more noise and outliers in multi-source heterogeneous data,which makes the performance of the algorithm greatly affected in practical applications.Multi-source heterogeneous data itself is collected from multiple channels or with multiple modalities,thus it is natural that there exists noise and outliers in multi-source heterogeneous data.To address these problems,a novel robust multi-view clustering method to integrate heterogeneous representations of data is proposed in this dissertation.To make the proposed method robust to the noises and outliers,especially the extreme data outliers,the capped norm based residual calculation is utilized in the objective.The proposed method is of low complexity,and in the same level as the classic k-means algorithm,which is a major advantage for unsupervised learning.2.One main disadvantage of most existing matrix factorization methods is that the corresponding optimization problems are non-convex and thus prone to becoming stuck into bad local minima.To alleviate this issue,a novel multi-view clustering method is proposed based on self-paced learning.The idea of self-paced learning has been employed to various machine learning methods.It trains the machine learning model with easy samples at first,and then progressively considers more complex samples until all the samples are selected.The proposed model first learns the multi-view clustering model with easy examples and then progressively considers complex ones from each view.In addition,a soft weighting scheme is designed to further reduce the negative impact of outliers and noises.3.Owing to the efficiency of uncovering the hidden structures of data,graph regularized approaches have been investigated widely for various multi-view learning tasks.However,similarity measurement in these methods is challenging since the construction of similarity graph is impacted by several factors such as the scale of data,neighborhood size,choice of similarity metric,noise and outliers.Moreover,nonlinear relationships usually exist in real-world datasets,which have not been considered by most existing methods.In order to address these challenges,a novel model which simultaneously performs multi-view clustering and learns similarity relationships in kernel spaces is proposed in this dissertation.Since the performance is often sensitive to the input kernel matrix,the proposed model is further extended with multiple kernel learning ability.4.Existing multi-view learning methods usually work in a single layer formulation.Since the mapping between the obtained representation and the original data contains rather complex hierarchical information with implicit lower-level hidden attributes,it is desirable to fully explore the hidden structures hierarchically.In this dissertation,a novel deep multi-view clustering model is proposed by uncovering the hierarchical semantics of the input data in a layer-wise way.By utilizing a novel deep matrix decomposition framework,the hidden representations are learned with respect to different attributes.The proposed model is able to collaboratively learn the hierarchical semantics obtained by each layer.The instances from the same class are forced to be closer layer by layer in the low-dimensional space,which is beneficial for the subsequent clustering task.
Keywords/Search Tags:Matrix Factorization, Multi-source Heterogeneous Data, Unsupervised Learning, Multiple Kernel Learning, Deep Learning
PDF Full Text Request
Related items