Font Size: a A A

Research On The Interpretability And Security Of Text Classification Models In Deep Learning

Posted on:2023-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C XuFull Text:PDF
GTID:1528307316451094Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In Natural Language Processing,text classification has been a hot issue and undergone a long process of development over decades.With the rise of the third wave of artificial intelligence,the emergence of deep learning promotes the technological progress of text classification,keeps achieving the breakthrough and becomes the mainstream approach to text classification.At the same time,the fast-growing deep learning also brings a series of new tasks and challenges.On the one hand,as an end-to-end learning strategy,deep learning models extract data features through deep neural networks and a significant number of network parameters.The models cannot provide the decision-making reasons,and they show high complexity and abstractness to the outside.There is a serious lack of interpretability.The black-box property poses major obstacles for the development and application of models and arouses widespread concern in both industrial and academic research.On the other hand,with the extensive deployment of deep learning models in a variety of real-world applications,the existence of adversarial examples poses potential risks to model security.Adversarial examples can fool the model to give a totally different prediction while they are very similar to the real examples.The task of adversarial attacks which can be used for the evaluation and improvement of model security begins to receive the attention in research.From the perspective of software engineering,interpretability and security contribute to scientific research and engineering applications significantly as the non-functional requirements of models.In this dissertation,we focus on the text classification models in deep learning and shed light on model interpretability and security.We hope our work can provide a sophisticated theory for the development of text classification in the new era.The research contents and contributions are threefold:The explanatory methods for specific text classification models:We propose the FFD(Feed Forward Decomposition)algorithm for the study of model-specific interpretability.The representative text classification models,namely FastText and TextCNN,are selected as the research targets,and FFD is further refined as the FFD-FT(FFD on FastText)algorithm and the FFD-TC(FFD on TextCNN)algorithm to translate the model predictions into a score matrix,which quantitatively describes how the input features impact on the prediction result of the target label.According to the explanatory results of the above algorithms,we propose different text classification methods based on score matrices and frequency statistics,and we also propose a meta-learning based few-shot text classification method.The extensive experiments demonstrate the effectiveness of FFD-FT and FFD-TC,reveal the internal mechanism of the target model and assess the performance of the above text classification methods comprehensively.The explanatory methods for arbitrary text classification models:From a local view,we clarify the scope of local explanations,define the related research issues with the necessary mathematical proof,and propose the LURLocator(Locally Universal Rules Locator)algorithm for local explanations.LURLocator searches for a feasible solution to the NP(Non-deterministic Polynomial)complete problem with the genetic algorithm in linear time.From a global view,we zoom in on the limitations of the popular gradient-based explanatory methods nowadays and propose the SignedGI(Signed Gradient × Input)algorithm for global explanations.We supplement the deduction which is easy-to-understanding for the word-level gradient,and we especially analyze the impact of its signedness on the explanatory results.The validity of LURLocator and SignedGI has been illustrated by the theoretical analysis and the empirical results.The framework and the algorithms for the two-stage explanation-based adversarial attacks:Inspired by the findings on the interpretability of deep learning models,we propose the TEAA(Two-stage Explanation-based Adversarial Attacks)framework.Under the guidance of TEAA,we propose two non-targeted adversarial attack algorithms,namely AdvLRP(Adversarial Attacks with Layer-wise Relevance Propagation)and AdvFFD(Adversarial Attacks with Feed Forward Decomposition).In this way,we successfully bridge between the current methods for interpretability and the task of adversarial attacks.Subsequently,we propose a comprehensive adversarial attack algorithm TextTricker.It can support both targeted and non-targeted attacks.Besides,it is equipped with two configurable implementations for various scenes.The extensive experiments validate the effectiveness of the aforementioned adversarial attack algorithms,analyze the semantic characteristics of the adversarial examples,and evaluate the defensive capability of different defense methods.It is worth mentioning that the interpretability and security of deep learning models are not independent of each other,but complementary and mutually reinforcing.For one thing,the research on model interpretability helps to find the weaknesses of the model.Consequently,the advanced adversarial attack algorithms can be proposed,and they can promote the evolution of model architectures towards better security.For another thing,adversarial examples are an important way to understand the operation mechanism of the model,which can provide a deeper insight on the intrinsic nature of the model and promote academic research on model interpretability.
Keywords/Search Tags:deep learning, text classification, explanatory methods, adversarial attacks
PDF Full Text Request
Related items