Distributed Information-Theoretic Learning

Posted on:2017-05-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P C Shen

Full Text:PDF

GTID:1108330488491029

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently, due to the quick development of distributed applications like wireless sensor net-work and distributed computing system, distributed data processing is attracting more and more attentions of researchers. For different types of distributed applications, distributed data processing is a common issue. The system architectures of different distributed applications can be modeled by a network which consists of many nodes with the ability of computation and communication. Distributed processing is about dealing with signals or data over the network. In the distributed processing mechanism considered in the paper, each node not only accomplishes local computa-tion based on its own data, but also exchanges some necessary information with its neighboring nodes, so as to obtain results in a totally decentralized manner but of global meaning.Currently, many distributed data processing algorithms have been proposed based on such a processing mechanism, covering various problems in the fields of distributed signal processing and machine learning. For the most of the algorithms, the objective functions are developed based on second-order statistics of data. However, when there is non-Gaussian noise in the environment, or when the data need to be processed has complicated data distribution, second-order statistics are not able to capture the information of data sufficiently. In such cases, we design objective functions for distributed learning problems based on information theoretic measures, in order to additionally make use of high-order statistics of data beyond second-order statistics, and thus the learning performances can be improved. However, due to some restricted conditions in distributed environment, itâ€™s not easy to do such a work. In the past research, similar work has never been done before. In this paper, we focus on the systematic study of three main kinds of learning prob-lems in the distributed environment, namely, the supervised learning, the unsupervised learning, and the semi-supervised learning. We overcome the difficulties and propose effective distributed information-theoretic learning algorithms for these learning problems, respectively.Specifically, we develop distributed minimum error entropy based algorithms for the problem of parameter estimation, which belongs to the category of supervised learning. To estimate the entropy, we employ two kinds of estimators, which are the estimator of quadratic Renyiâ€™s entropy and the estimator of Shannonâ€™s entropy bound. Correspondingly, we obtain two kinds of distributed estimation algorithms based on the two different estimators. Simulation results show that under non-Gaussian noise, the proposed algorithms can achieve more accurate estimates of parameters than the distributed least-mean-square algorithm.We develop distributed maximum mutual information based algorithms for the problem of clustering, which belongs to the category of unsupervised learning. In the algorithms, we model the cluster boundaries by parameterized discrimination functions, and calculate the mutual in-formation based on the discrimination functions. Each node performs cooperative clustering via exchanging the parameters of discrimination functions with its neighboring nodes. We test the performances of the proposed algorithms based on both synthesis data and real data. Simulation results show that the proposed distributed algorithms achieve clustering results similar to those of the corresponding centralized clustering algorithms. Besides, when the data distributions are complicated, the proposed algorithms obtain better clustering results than the distributed K-means algorithm. Moreover, we develop a distributed KL divergence based vector quantization algorith-m, which also belongs to the category of unsupervised learning. In the designing of the objective function of the quantization algorithm, we make many efforts on the selection of divergenceâ€™s type and direction. The developed distributed quantization algorithm can make each node cooperatively learn reproduction vectors close to those obtained by centralized processing, in an online manner. Simulation results show that, when there are outliers in the data, the proposed algorithm outperform the distributed LBG algorithm and the distributed SOM algorithm.We develop distributed semi-supervised metric learning algorithms based on information-theoretic measures. We employ two kinds of distributed cooperative strategies to respectively de-velop frameworks for distributed semi-supervised metric learning. In particular, by setting the loss functions and regular terms in the objective functions of the distributed metric learning frameworks according to the centralized SERAPH algorithm, we obtain two kinds of distributed information-theoretic metric learning algorithms. Simulation results show that the two proposed algorithms can learn metric matrixes similar to those learned by the centralized SERAPH algorithm, so they are two good distributed approximations to the centralized metric learning.

Keywords/Search Tags:

Distributed processing, information-theoretic learning, parameter estimation, cluster- ing, vector quantization, metric learning

PDF Full Text Request

Related items

1	Distributed Quantization-Estimation For Wireless Sensor Networks
2	Semantic Distance Metric Learning And Its Application In Multimedia Content Analysis
3	Research And Application On Supervised Similarity Metric Learning Approaches
4	Similarity Measures In Cluster Analysis And Its Applications
5	Research On Metric Learning Based Support Vector Machine Algorithm And Its Applications
6	Low-bit Quantization Algorithm Over Wirelesssensor Network
7	Study On Some Support Vector Machine Algorithms And Their Applications
8	Research On Kinshipauthentication Algorithm Based On Feature Extraction And Metric Learning
9	Research On Metric Learning Based Clustering Method With Incomplete Data
10	Vector Quantization Codebook Design