Font Size: a A A

Improvement Of Text Classification Algorithm Based On Few-shot Learning

Posted on:2024-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2568307079461504Subject:Statistics
Abstract/Summary:PDF Full Text Request
As a carrier of various information,data plays an important role in the development of the Internet.Among them,text data has various advantages such as fast upload and download,diverse processing methods,etc.,and is an important object of research in information processing.In the task of text data classification,the problem of few-shot learning,where there is only little data available for model learning in a certain category,is a common and challenging problem.If classification of a large number of categories of text data can be accomplished using few samples,it can greatly improve the ability of deep learning models to learn new rule concepts effectively,which is of great importance.In the few-shot classification task,the mainstream solutions suffer from,for example,inconsistent reliability of the augmented sample data and poor consideration of the loss function for the distribution of data in different categories.In this thesis,we focus on the shortcomings of augmented sample quality and loss function,and the main contents and contributions are as follows:(1)The reliability verification and quantification of augmented samples are achieved by using expert samples and metrics synthesis.This thesis designs filtering algorithms to analyze the reliability of enhanced samples based on expert samples and semantic analysis methods,filters samples with low reliability,solves the problem of algorithms changing the semantics of text data or bringing additional noise interference in data enhancement,thus solving the problems of rough processing and low accuracy caused by inconsistent reliability of enhanced sample data.In addition,this thesis combines different reliability measures to design a more comprehensive new measure,Verdict-Value,which solves the problem of large differences in scoring among different indicators.The above schemes are combined to form a more reliable data enhancement scheme to alleviate the critical sample imbalance problem in few-shot learning.The experimental results on two real datasets show that the method designed in this thesis not only saves a lot of manual labor,but also can effectively filter out high-quality generated samples,successfully expand a batch of high-quality data for a small number of samples,and finally improve the model performance.(2)A data fusion method based on model confidence integration learning is implemented.In this thesis,we design a new data fusion algorithm based on the idea of model fusion in machine learning and combine it with the integrated learning method,and fuse the results of different enhancement algorithms to improve the model performance,which solves the problems of lack of uniformity and low degree of fusion of different data enhancement algorithms.The data fusion method can take up as little time and resources as possible,and allocate the ratio more reasonably to fuse the real text in the original dataset with the ground text generated by several enhancement methods.(3)Loss function in few-shot classification is optimized.Based on the results of data augmentation output,this thesis investigates and designs a new loss function based on three aspects: minimum effective sample number,classification difficulty,and addition of perturbation terms,focusing on samples with higher reliability in the few-shot category to improve the model performance,which solves the deficiency that the loss function lacks consideration of the distribution of data augmentation samples in the current mainstream research.Experiments are conducted to validate the proposed improved loss function on four real datasets,and the experimental results show that the proposed loss function in this thesis effectively helps the model to predict the labels more closely to the real labels.
Keywords/Search Tags:Data augmentation, Few-shot learning, Confidence, Loss function
PDF Full Text Request
Related items