Font Size: a A A

Research And Implementation Of Interpretable Ai Identification Of Fraudulent Information

Posted on:2022-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2518306494481054Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,China's fraudulent criminal activities are rampant and have become a major social evil that affects people's sense of security and well-being.China actively carries out "antifraud" work,but it is difficult to implement timely and accurate combat against fraud criminal behavior,so how to prevent being cheated has become the top priority of "anti-fraud" work.Automated identification of fraudulent information can help people find fraudulent traps and prevent them from being cheated in time,which has certain social practical significance.Fraudulent information identification can be essentially boiled down to a textual binary classification task,and currently,most of the methods with better identification results are based on deep learning.Such methods first encode text into word vectors and then combine deep learning models(e.g.CNN,RNN,etc.)for end-to-end identification,with the shortcoming that the models can only consider local features or one-way word order of text data,losing much information.In addition,at this stage,fraudulent information identification is only in the research stage,and it is difficult to land in practical applications.One of the important reasons is the uninterpretability of deep learning models,and users will not trust the judgment of a black box model that cannot provide explanation.Therefore,having both identification ability and explanation ability is a necessary condition for the model to be able to be practically implemented in the fraudulent information identification scenario.To address the above issues,this paper conducts research in the following areas:(1)The pre-trained model roBERTa is proposed to be applied to the task of fraudulent information identification.roBERTa discards the traditional feature extractors CNN and RNN and uses the Transformer encoder in the main part of the model,where the self-attentive mechanism can simultaneously take into account the contextual features of the text and the positional encoding added to the input vector can preserve the textual word order information.During pre-training,roBERTa has learned enough semantic syntactic knowledge on a large corpus and only needs to be fine-tuned in the downstream tasks corresponding to this paper.In this paper,we first preprocess the data,including deactivating words and balancing the dataset using oversampling.After that,the preprocessed dataset is used to fine-tune roBERTa to make it perform better in fraudulent information identification scenarios.The experimental results show that roBERTa achieves optimal results on the fraudulent information identification task compared to the other three benchmark models.(2)The model-agnostic local post-hoc interpretation method LIME was optimized to achieve reproducibility of the interpretation results by setting an initial random seed,and the prediction results of roBERTa were interpreted based on the optimized LIME.Perturbation sampling is performed around a single instance,and the samples are weighted according to their cosine distance from the instance to obtain a perturbed dataset,after which a linear model is trained locally to fit the prediction results of roBERTa.The initial random seed is set in such a way that LIME generates the dataset with the same perturbation in the face of the same input,resulting in a consistent interpretation.The experimental results show that the optimized LIME is able to identify the most relevant features driving roBERTa decisions,thus achieving the interpretability of roBERTa.This lays the foundation for the practical implementation of roBERTa in fraudulent information identification scenarios.(3)A fraudulent information interpretable AI identification system is designed and implemented,which realizes the main functions of user management,model management,fraudulent information data management and fraudulent information interpretable AI identification,and then shows the main functional interface of the system.
Keywords/Search Tags:fraudulent information, text classification, interpretability, reproducibility
PDF Full Text Request
Related items