Design And Implementation Of Lightweight Fraudulent Text Recognition Algorithm

Posted on:2024-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Zhou

Full Text:PDF

GTID:2568306914458224

Subject:artificial intelligence

Abstract/Summary:

PDF Full Text Request

As the Internet develops rapidly and smartphones become more widely used,people have increasingly diverse ways and methods of obtaining information.While the Internet has brought convenience to people,it has also provided opportunities for criminals who mix harmful content into the massive amount of information online,posing significant threats to people’s property security and social stability and harmony.Various forms of fraudulent texts widely exist in textual messages and social media platforms,with frequent updates to evade network supervision by intentionally creating misspellings and replacing fraudulent words with different ones.Traditional fraudulent text recognition methods cannot dynamically respond to changes in fraudulent texts,resulting in low accuracy in identifying new types of frauds and new forms of text obfuscation.The high accuracy algorithms have the problem of slow reasoning speed.To address these problems,this thesis analyzes the misspelling substitutions in fraudulent texts and conducts in-depth research on spelling correction,type recognition,and model compression.The main contributions and innovations of this thesis are as follows:1.This thesis constructs a new Chinese fraudulent text dataset.Since existing Chinese fraudulent text datasets are relatively outdated and have significant differences from current fraud forms,this thesis supplements public datasets with data obtained from real scenarios and annotates data for three tasks:spelling correction,binary classification of fraud or not,and multi-class intention recognition.This dataset is more realistic than previous public datasets and helps researchers to analyze Chinese fraudulent texts more comprehensively.2.This thesis proposes a Chinese spelling correction algorithm based on gated feature fusion.The algorithm is used to detect and correct misspellings in fraudulent texts and other domain texts.The algorithm selectively fuses information about semantics,pronunciation,and glyph of Chinese characters using gate networks to improve model correction and error interpretation capabilities.Compared with the baseline models,experiments show that the algorithm achieves the best correction effect in both the fraudulent text dataset and the SIGHAN dataset.The effectiveness of each module is verified through ablation experiments,and typical cases are analyzed to verify that the algorithm can effectively explain the reasons for errors.3.This thesis proposes a fraudulent text recognition algorithm based on prompt and spelling checking.The algorithm aligns the fraudulent text classification task and the spelling correction task by using prompt learning on the basis of the spelling correction model,optimizing both tasks to avoid building additional classifier.The algorithm can effectively deal with obfuscated texts and new types of frauds.Through experiments comparing with the baseline models,the algorithm shows excellent performance in fraudulent text recognition task.The analysis of the model’s attention weight verifies that the algorithm pays attention to suspicious words during prediction.4.This thesis designs a lightweight scheme for fraudulent text recognition model based on knowledge distillation.This method compresses the model size by distilling knowledge from four perspectives:gate vectors,hidden layer outputs,attention matrices,and backbone outputs.Through comparative experiments,the distilled student model outperforms the BERT-based fraudulent text recognition model with about one-fifth of the parameters.Ablation experiments verify the effectiveness of each part of loss function in the knowledge distillation method used in this thesis.

Keywords/Search Tags:

Fraudulent Text Recognition, Feature Fusion, Prompt Learning, Knowledge Distillation

PDF Full Text Request

Related items

1	Scene Text Recognition Based On Attention Mechanism And Knowledge Distillation
2	Research On Automatic Scoring Methods For Cross-prompt
3	Research On Action Recognition Method Based On Spatiotemporal Feature Fusion And Knowledge Distillation Technology
4	Research On Sentiment Classification With Ensembled Knowledge Distillation
5	Research On Rumor Detection Method Via On Knowledge Distillation
6	Research Of Knowledge Distillation Based On Multiple Feature Matching Mechanism
7	Research On Multitype Signal Processing Algorithms Based On Precoding And Knowledge Distillation
8	Boundary Regression Of Complex Text Regions In Natural Scenes
9	Research On Object Detection Algorithm Based On Knowledge Distillation
10	Human Action Recognition Based On Spatial Temporal Feature Fusion