Machine learning has achieved considerable success in numerous fields,but its theoretical foundation remains incomplete.For instance,neural networks,also known as learning machines,are often considered black boxes due to the lack of clear comprehension of their inner workings.Moreover,a theoretical framework to guide the selection of hyperparameters such as the number of hidden layer units is also lacking.With the increasing applications of machine learning in various domains,there is an urgent need to deepen our understanding of the inner workings of learning machines.This thesis aims to explore the inner workings of learning machines by utilizing statistical mechanics and information theory,and proposes new interpretable and efficient learning machines based on our research results.The main research content and results are summarized as follows.First,we propose the maximal relevance principle to shed light on the inner workings of learning machines from an information theory perspective.In a machine learning task,the hidden layers of a learning machine extract useful information from input data,allowing the machine to perform predictions or classifications accurately.The useful information refers to the information containing the feature of data without noise.To quantify the useful information extracted by the hidden layers,we propose a definition of relevance within the framework of machine learning,which measures the amount of useful information by calculating the entropy of the energy of the hidden layers.Building on the intrinsic concept of relevance,we propose the maximal relevance principle,which can compute the theoretical maximal value of relevance in the hidden layers.Numerical experiments demonstrate that the relevance of all well-trained learning machines approaches the theoretical maximal relevance.This observation implies that the trained machines extract the maximal amount of useful information from data,which explains their high efficiency in performing tasks.Then,based on the maximal relevance principle,we explore new probabilistic networks for unsupervised learning.In these networks,the distribution of the hidden layer is fixed a priori as a probability distribution of maximal relevance,and only the conditional distribution of the data,given the hidden states,is learned during training.Our numerical analysis demonstrates that these new learning machines are simpler,more efficient,and more interpretable than conventional learning machines.This is because the hidden layer’s distribution is fixed and given.Furthermore,our new learning machines require only a small number of hidden units to achieve optimal performance.Finally,to understand the inner workings of the learning machine from the perspective of statistical mechanics,we study the statistical properties of the learning machine utilizing a random energy ensemble approach.Each hidden layer is described as a random energy model with a stretched exponential distribution of energies.We find that to propagate the dependence on the data to the deep layers in a learning machine,each hidden layer should be tuned at the critical point of the corresponding random energy model.The proposed maximal relevance principle and random energy ensemble analysis approach in this thesis can serve as general frameworks for analyzing the internal mechanisms of learning machines,and designing innovative and efficient learning machines.Moreover,the new learning machines proposed in this thesis,which are founded on the maximal relevance principle,are simpler,more effective,and more interpretable than conventional learning machines. |