Font Size: a A A

Application Research Of Multi-View Multi-Task Classification/Clustering Algorithms For Large Scale Data

Posted on:2017-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q HuangFull Text:PDF
GTID:1108330488480590Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Machine learning has become one of the most important research topics in artificial intelligence, and pattern classification and clustering analysis are two fundamental crucial techniques in machine learning, which have been widely used in various fields, such as natural language processing, biometrics, computer vision, speech recognition, image recognition etc. Hitherto, the machine learning methods for large scale datasets have been further studied and many obtained key technologies have been successfully and widly applied in practical production. Nevertheless, there exists many issues in classification and clustering tasks to be further explored and extensively studied, e.g. universal fuzziness index in fuzzy clustering, improving the generalization performance of kernelized large scale data classification and loss function for large scale data classification. Therefore, we focus on large scale datasets and address the above issues in our study. The main contributions are as the following:(1) The fuzziness index m plays an important role in the clustering result of fuzzy clustering algorithms. In order to avoid the fuzziness index m of the CA(competitive agglomeration) clustering algorithm based on FCM(fuzzy C-Means) clustering algorithm framework being forced to fix at the usual value 2, a more universal fuzzy clustering algorithm is proposed. Firstly, a fuzzy clustering algorithm named EIC-FCM(entropy index constraint FCM), which has comparable clustering performance to the classical FCM algorithm, is presented by introducing an entropy index r into constraints with m = 1. The successful introducing of entropy index r effectively makes the fuzziness index with m > 1 constraint transform into the entropy index with 0 < r < 1 constraint. In addition, a universal competitive agglomeration clustering algorithm called EICCA(entropy index constraint CA) is proposed by introducing a competitive term similar to that in CA clustering algorithm into EIC-FCM objective function. Several experimental results on synthetic datasets and the UCI machine learning datasets show that the universal competitive agglomeration clustering algorithm based upon EIC-FCM clustering algorithm framework can effectively gain the optimal number of clusters for the datasets to be clustered, with more adaptive parameter choices than the classical CA clustering algorithm having the fuzziness index m = 2 only.(2) In order to improve the generalization performance of the kernelized two-class L2-SVM, a multi-view pattern classification algorithm called Multi-view L2-SVM is presented by introducing multi-view learning to the kernelized two-class L2-SVM, the kernelized two-class L2-SVM with multi-view(Multi-view L2-SVM) is equivalently formulated as the Center Constrained Minimum Enclosing Ball(CCMEB) problem and then a novel classification method named Multi-view Core Vector Machine(MvCVM) is proposed. The proposed classifiers Multi-view L2-SVM and MvCVM both can obtain a overall consensus classification result on each view because the differences and the associations between different views are both considered in the two proposed classifiers. An extensive set of experiments on synthetic and real-world multi-view datasets are conducted to demonstrate the effectiveness of the proposed methods.(3) In order to improve the generalization performance of logistic regression(LR), a soft margin classification model v-SMLRC is presented by introducing margin parameters to LR model, and a soft margin multi-task classification model v-SMMTL-LR is presented by utilizing multi-task learning to introduce regularization parameter and margin parameters to LR model. The duals of v-SMLRC and v-SMMTL-LR can be regarded as CDdual problem with equality constraint and then two new large scale pattern classification methods called v-SMLRC-CDdual and v-SMMTL-LR-CDdual are proposed. The proposed v-SMLRC-CDdual and v-SMMTL-LR-CDdual can maximize the inner-class margin and effectively enhance the generalization performance of LR. Empirical results conducted with large scale document datasets demonstrate that the proposed method is effective and comparable to other related methods.
Keywords/Search Tags:Fuzziness index, Entropy index, Competitive agglomeration, Multi-view learning, Core vector machine, Soft margin support vector machine, Logistic regression, Multi-task learning
PDF Full Text Request
Related items