Research On Crowdsourcing Text Integration Scheme With Domain Transfer Characteristic

Posted on:2024-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:X Yu

Full Text:PDF

GTID:2568307106481984

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Crowdsourced text is one of the most common ways to acquire information from current internet channels,replete with rich content and perspectives.Integrating crowdsourced text can distill key content and attitudes,providing valuable feedback and decision-making references for information collectors.However,the sources of crowdsourced text data often span multiple domains,with scant data and limited reference truths in new domains.This precludes the use of conventional supervised learning methods to train automated text summary generation models based on deep neural networks,thus constraining the capacity to aggregate extensive crowdsourced text.To solve this problem,this paper proposes a crowdsourced text integration solution based on domain transfer characteristics.By improving the deep neural network summarization model and using the existing reference-value-containing data in related source domains,summary texts can be generated for integrated crowdsourced text in the target domain,avoiding the high cost of data annotation for the target domain.Therefore,this paper focuses on the transfer summarization generation task under the condition of multi domains and small samples and carries out the following two research tasks:(1)The Semantic Feature Transduction Transfer Method Based on Data Characteristics is designed to address discrepancies in data distribution across different domains and the dearth of data in new domains.Firstly,a method based on aligning the distribution of data representations in the domain is adopted to align the source and target domain data by minimizing the difference between the distribution of data word embedding vectors in the reproducing kernel Hilbert space.Then,a direct semantic feature transduction method is used to improve and train the deep neural network abstractive summarization model using semantic features of text data as input.This model can learn the semantic correlations between data from different domains,so that the unannotated data in the target domain can be associated with the annotated data in the relevant source domain.Lastly,empirical validation on the publicly available PENS News Dataset demonstrates the efficacy of the method proposed in this paper for transductive text generation.(2)The Fast Adaptation Fine-Tuning Method for Domain with Few Samples Data is devised to address fast generalization in the target domain and the fine-tuning of small data.Firstly,an adapter layer is added to both the encoder and decoder sides of the deep neural network to adjust to a small amount of reference-truth data in the target domain.The model is then trained using meta-learning methods with the task set data,which enables the model to adapt and generalize to the target domain even when fine-tuning is performed using only a small amount of truth-containing data in the target domain quickly and stably.Experimental validation on the publicly accessible Amazon review dataset reveals superior summary generation performance for the proposed method compared to other advanced transductive text generation methods,and it achieves steady performance even with minimal fine-tuning data.

Keywords/Search Tags:

Crowdsourced text integration, Automatic text summarization, Data distribution alignment, Zero-shot learning, Few-shot learning

PDF Full Text Request

Related items

1	Improvement Of Text Classification Algorithm Based On Few-shot Learning
2	The Research On Improved Multi-model And Its Fusion For Few-shot Text Classification
3	Research On Automatic Text Summarization Algorithm For Chinese And English Long Text
4	Research On Text Classification Based On Few-Shot Learning
5	Zero Shot Text Classification Method On Semantic Extension And Generative Network
6	Research And Application Of Few-shot Text Classification Based On Prompt Learning
7	Studies On Few-shot Learning Methods Based On MLM
8	Research On Few-shot Learning Based On Pre-trained Token-replaced Detection Models
9	Research On Few-Shot Text Classification Based On Meta-Learning
10	Research On Key Issues Of Automatic Text Summarization Technology