| With the fast development of artificial intelligence technologies,deep learning and deep neural networks(DNNs)have been widely applied for solving many issues in real life and have achieved remarkable results.Thus,they obtain extensive attentions from academia,industry,and even the whole society.The superior performance of the DNNs is benefit from automatic learning and feature extraction according to data and extremely depends on the usage of massive high quality training data.These procedures are mainly implemented by fitting input data and output targets,optimizing loss functions,and minimizing the fitting errors.The design of the data-driven DNNs overemphasizes the empirically high performance,but ignores the theoretical evidences,which makes it difficult to address key issues such as low interpretability,poor overfitting prevention ability,and high uncertainty of the DNNs.Probabilistic and statistic theories are foundational tools for optimizing and analyzing the DNNs.Probabilistic models,as the important components of statistic models and learning foundations,are included in a distribution model paradigm group that reflects the randomness of observation data and embodies the statistic assumptions of data generation.As the interpretably probabilistic relation between their inputs and outputs has been considered in the original designs,the probabilistic models can effectively describe the distribution of data or features and have great interpretability.Thus,the probabilistic models are effective techniques for addressing the current key issues of the DNNs.This thesis focuses on probabilistic model-based DNNs.The applications of Bayesian optimization theories and gradient-based optimization algorithms for the optimization process of the aforementioned probabilistic models are studied and explored.By introducing the probabilistic models into the DNNs to improve the structures of them,probabilistic representation can be undertaken for the feature spaces of the DNNs with the probabilistic models.This can further enhance the robustness of the DNNs and their ability to cope with the uncertainty of input data.The decisions of them can be more reliable.The contributions of this thesis can be summarized as follows.1.To address the issues that conventional visual attention mechanisms lack of theoretical foundations and their basic operations suffer from low interpretability,this thesis studies probabilistic modeling for channel attention mechanism based on Gaussian process and introduces a non-Gaussian probabilistic model into the visual attention module to learn the channel correlations of convolutional neural networks(CNNs)in a probabilistic way.In the proposed Gaussian process-embedded channel attention(GPCA)module,the output channel attention masks are assumed to follow a beta distribution.As the beta distribution cannot be integrated into the end-to-end training of CNNs with a mathematically tractable solution,we utilize an approximation of the beta distribution(i.e.,sigmoid-Gaussian approximation)to obtain a mathematically tractable closed-form solution and improve the computational efficiency of beta distribution parameters.In the proposed sigmoid-Gaussian approximation,Gaussian distributed variables are transferred into the interval[0,1]by a sigmoid function,where prior distribution is set as a Gaussian process in order to model correlations between channels.In this case,the proposed GPCA module can intuitively and reasonably model the channel attention mechanism in a probabilistic way and improve the interpretability and robustness of the visual attention mechanisms in the DNNs.Experimental results show that the proposed GPCA module can obtain superior performance on five image classification datasets.Meanwhile,it can also achieve good effects in weakly supervised object localization,object detection,semantic segmentation tasks.Visualization results and interpretability analysis can demonstrate its advantages on interpretability as well.2.Conventional dropout regularization techniques applied specific probabilistic distributions for dropout masks,which introduces systemic bias in model training and finally restricts the modeling ability of the DNNs.This thesis studies a dropout regularization technique based on non-Gaussian prior in the DNNs and proposes an advanced dropout technique.The proposed advanced dropout can be divided into two key parts including model-free framework and parametric prior,where the model-free framework contains model-free distribution,seed variable,and mapping function.The proposed model-free framework is more flexible than the existing techniques with specifically explicit distributions in dropouts,which can reduce the systemic bias and induce the conventional dropout techniques under the model-free framework.This thesis also designs a parametric prior for the model-free distribution to adaptively adjust the dropout rates based on input features during DNN training.The parametric prior can be integrated into the end-to-end training of the DNNs.In the model training,the advanced dropout is optimized by the stochastic gradient variational Bayes(SGVB)inference and conducts the adaptive estimation of dropout mask probabilistic distributions to improve their overfitting prevention ability in practice.Experimental results show the proposed advanced dropout can be employed in various base models and outperform nine recently proposed dropout techniques on seven widely used datasets by performance,respectively.Meanwhile,it can obtain the optimal cost-effectiveness ratio values in most of the datasets.It demonstrates good properties on different aspects,including adaptive dropout rate and overfitting prevention ability,as well.3.Due to the issues that the output results of the current DNNs commonly suffer from high uncertainty which cannot guarantee the stability and reliability of DNN performance,and the DNNs cannot reflect the changes of data well,this thesis studies uncertainty inference method for output features based on mixture model and proposes a dual-supervised uncertainty inference(DS-UI)method.This thesis introduces a mixture of Gaussian mixture models(MoGMM)into the output layer of a DNN as a probabilistic interpreter for the features and proposes MoGMM-fully connected(MoGMM-FC)layer,which combines a classifier in the output layer and an MoGMM.To enhance the learning ability of the MoGMM-FC layer,this thesis also proposes a dual-supervised SGVB(DS-SGVB)algorithm that comprehensively considers the influence of both the positive and the negative samples for model learning and feature extraction by reducing the intra-class distances and enlarging the inter-class margins simultaneously.The proposed DS-UI method can improve the probabilistic representation ability of DNN output features for confidence estimation.Experimental results show that the proposed DS-UI method can obtain significant performance improvement in misclassification and open-set out-of-domain/-distribution detection tasks,which outperforms the state-of-the-art uncertainty inference methods.Meanwhile,visualization results can also illustrate its effectiveness.4.In order to analyze the three proposed methods for their effect on a whole DNN model,this thesis applies them into one model,which are applied in the convolutional layers,fully connected layers,and the original output layer,respectively,and evaluates the performance improvement of the constructed whole one in image classification,misclassification detection,and open-set out-of-domain/-distribution detection.Experimental results show that the whole DNN model can achieve statistically significant improvements compared with the aforementioned methods in the tasks and has good effectiveness and adaptability. |