Font Size: a A A

The Research Of Few-Shot Text Categorization Method Based On SLDA And Prototypical Network

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:S F ZhangFull Text:PDF
GTID:2518306608990389Subject:Books intelligence
Abstract/Summary:PDF Full Text Request
Text categorization,as a basic and key task in the field of natural language processing,implements the sorting,induction and classification of document resources,and applies to information filtering,public opinion detection and news document classification and other scenarios.In recent years,with the tendency of algorithm complexity and structure hierarchy,deep learning which relies on big data has excellent performance of robustness and generalization.At present,deep learning is widely used in text categorization tasks with remarkable achievements.However,for some specific fields,the cost of data collection and data annotation is expensive.When samples are scarce or annotated data is lacking,supervised deep learning has over-fitting phenomenon,and it is difficult for the model to achieve good performance.Although data augmentation and regularization methods can alleviate the over-fitting problem,this problem has not been completely solved.On the contrary,humans can learn the essential characteristics of things through a small amount of knowledge,and quickly generalize to the recognition and prediction process of new tasks.Inspired by this,researchers propose few-shot learning,which aims to learn solutions to problems from a small amount of data.Therefore,few-shot learning methods based on small data sets have become the key technology to solve such problems.Hence,how to obtain ideal performance results in the case of insufficient training data or high quality annotation data is the difficulty of current few-shot learning research.Aiming at the problem of poor feature representation and low model generalization performance due to the lack of labeled samples and noise data interference in few-shot learning.This paper carries out research from supervised topic model and multi-level feature representation method,and focuses on the following research contents:(1)In this paper,meta-learning is used to simulate few-shot tasks,and a new Dynamic Routing Prototypical Network based on SLDA(DRP-SLDA)is proposed by combining supervised topic model and dynamic routing algorithm.SLDA topic model is used to build semantic mapping between words and categories to enhance word category distribution characteristics,and the semantic representation of samples is encoded from the perspective of word granularity.A Dynamic Routing Prototype Network(DR-Proto)is proposed,which utilizes the semantic relationship between samples by extracting cross-features,and uses dynamic routing algorithm to iteratively generate dynamic prototypes with representative categories,aiming to solve the problem of feature expression.The experimental results show that on the Few Rel,20 newsgroup,and Sogou data sets,the DRP-SLDA model can effectively extract the category distribution characteristics of words,and obtain the dynamic prototype to improve the category recognition ability,so as to effectively improve the generalization performance of few-shot text categorization.(2)In this paper,within the framework of meta-learning,a two-way network structure is constructed to introduce topic,and a Feature Integration model based on SLDA(FI-SLDA)is proposed.The SLDA model is introduced to map the samples to the semantic-rich topic space with fine-grained granularity.A two-way feature integration network is proposed.The Hierarchical Network uses multi-layer convolutional neural network to extract local features to obtain support set sample feature representation;The Cross Network takes into account the sample interaction features of the support set and the query set,and uses the attention mechanism to obtain the global feature representation of the document set;The features extracted from different networks are fused to obtain the category vector with distinct feature distribution,which makes the classification boundary clearer.The experimental results show that the FI-SLDA model significantly improves the classification accuracy,which compared with the 6 existing few-shot learning models;With the noise ratio of 0%,10%,30% and50% is set on 20 newsgroup and Sogou data sets,the local feature and global feature are extracted effectively by two-way network respectively,indicating that the FI-SLDA model fully considers the importance of support set and query set to classification,so as to actively improve the accuracy of text categorization.
Keywords/Search Tags:Few-Shot Learning, Prototypical Network, Meta-Learning, Topic Model, Text Categorization
PDF Full Text Request
Related items