Font Size: a A A

Research And Application Of Ante-Hoc Interpretability Learning Of Neural Networks

Posted on:2024-10-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:W HanFull Text:PDF
GTID:1528307373968929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an efficient and complex machine learning model,deep neural networks are gradually participating in various fields of life,but their black-box nature has raised concerns about the credibility of model decision-making in practical applications.Therefore,neural network interpretability has gradually become a hot topic in both academia and industry.It aims to provide intuitive explanations for the models’ decision-making processes and internal semantic patterns,thereby enhancing the credibility and safety of the models.To date,some works related to neural network interpretability have been proposed,yet numerous challenges remain.Firstly,the immense number of parameters contained within neural network models,combined with the non-linear nature introduced by activation functions,results in extremely non-linear information transmission between layers.Such complexity leads to intricate and difficult-to-interpret decision-making pathways,disordered internal representations lacking function partitioning,and the inability to effectively explore the hidden semantics of each neuron.Additionally,the practical implementation of neural network interpretability faces the issue of insufficient human-friendly interactive information,which hampers the ultimate application of interpretability.To address the aforementioned issues and challenges,this dissertation conducts research from two specific perspectives: interpretability learning algorithms for neural network decision pathways,internal representations,and neuron hidden semantics,as well as human-friendly neural network interpretability applications.The main contributions and innovations of this dissertation are as follows:1.For the issue of unclear decision-making paths in neural networks,this dissertation focuses on transparent model design and introduces a novel method for the automatic construction of tree-shaped neural networks.This method facilitates the intuitive interpretability of information pathways within a finite number of semantic branch modules.Existing approaches to constructing tree-shaped neural networks rely on offline clustering to form category hierarchy,where each category cluster processes similar semantics and is used to train different branch modules.However,these approaches are complicated and lack effective validation of category hierarchy.In contrast,this dissertation makes the clustering of categories derivable through category prototype representation.Under the paradigm of meta-learning,it achieves unified learning through classification error.This process enables the automatic construction of category hierarchies and the training of tree-shaped neural networks end-to-end.Qualitative experiments illustrate the transparent decision-making paths of the constructed tree-shaped neural networks through interpretation analysis.Experiments on multiple image classification datasets demonstrate that the proposed model outperforms related methods in classification accuracy,especially in complex tasks,with an improvement of 1%-4%.2.For the issue of disorganized internal representations in neural networks,this dissertation aims at the interpretability constraint and proposes a neuron synchronization algorithm inspired by the brain.It enables the spontaneous formation of function modules among neurons during training.Meanwhile,this dissertation extends the dimensions of interpretability measurement for neuron activation features.Unlike existing methods of feature interpretability constraints from an Informatics perspective,the proposed approach is inspired by neurological studies of the brain.Reflecting the phenomenon where neurons gradually synchronize with age to form distinct functional brain areas,responding collectively to a pattern to perform higher-order functions,the algorithm aggregates multiple neurons with simple semantics to represent a higher-order semantic collectively through a synchronization mechanism among neurons.This promotes the spontaneous formation of specific function modules within the neural network during training.For verifying the interpretability of neuron activation map features,beyond the accuracy of explanation,this dissertation measures the purity,stability,and diversity of explanations.The proposed method achieves an average improvement of 10% across these metrics,thereby validating the superior feature interpretability.3.For the issue of confusing latent semantics in individual neurons within neural networks,this dissertation proposes a method for embedding hidden semantics of neurons,enabling soft description and dissection of neuron functions.Meanwhile,it discusses quantitative evaluation metrics for local and global interpretations of neural network hidden semantics.Previous methods of neuron hidden semantic interpretation matched neuron activations to image semantic annotations,assigning semantics to neurons whose response exceeded a threshold.Compared to these methods,which rely heavily on finegrained annotations and provide incomplete explanations for all neurons,this dissertation introduces a novel representation method for neuron function.Based on an idea similar to word embedding,it uses only the activation of neurons in response to samples to train function representations for each neuron.On this basis,the dissertation employs function embedding-guided hard routing to ensure neurons’ specific responses to corresponding semantics during forward propagation,while introducing a synchronization mechanism to compress massive individual neuron grey-box explanations.Qualitative experiments display the distribution of aggregated function modules in each layer and the top-down information pathways.In terms of quantitative metrics for hidden semantic interpretability,the proposed method achieves optimal accuracy in global semantic interpretation and demonstrates excellent interpretability fidelity,robustness,and complexity in semantic attribution explanations for local samples.4.For the issue of insufficient effective human-machine interaction information in interpretability applications,this dissertation proposes a human-machine connectivity framework based on the alignment of neurons’ hidden semantic embedding and information translation by large language models.Besides model-human connectivity,this dissertation further explores the complete interconnection framework with human-model and model-model connections.This framework facilitates intuitive and effective explanations of model internals.Compared to current grey-box explanation methods aimed at experts,the proposed method uses natural language as an interaction medium.It achieves an intuitive and effective human-machine communication pathway by aligning semantic embeddings and employing large language models as information transformation mechanisms.Moreover,it perfects the human-machine connectivity framework by mapping human/model information back into corresponding models through a reverse process.To validate the effectiveness of this framework,this dissertation assesses the global interpretability quantitative metrics of hidden semantic embeddings at every layer of the neural network.Based on high interpretation accuracy,qualitative experiments of individual neuron explanations,cross-layer neuron function correlations,and class-specific neuron semantic combinations further prove the intuition,accuracy,and comprehensiveness of the explanations.Human-machine link experiments have demonstrated directed changes in classification accuracy on specific categorization tasks through the human-guided modification of model parameters.The superior results in the knowledge distillation tasks of the machine-to-machine link validate that neuronal semantic embeddings,serving as static knowledge,facilitate efficient knowledge transfer between models.
Keywords/Search Tags:Deep neural network, interpretability, modularization, synchronization mechanism, large language model
PDF Full Text Request
Related items