Font Size: a A A

Deep Learning-based Representation Of Text Features At Different Granularities

Posted on:2022-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:J Q HaoFull Text:PDF
GTID:2518306608990399Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Text feature extraction and representation is a primary problem in the field of text mining,and it is the premise of natural language processing tasks such as text classification and abstract extraction.With the rapid development of artificial intelligence,text feature representation combined with deep learning network has become the mainstream method in natural language processing.For natural language processing tasks with different granularity,the single-model text feature representation method cannot provide more accurate text feature information for downstream tasks with different granularity.For example,for sentence-level text feature representation methods,single-model text feature representation methods tend to focus only on local text features,thus losing important global information.For the representation method of document-level text features,the relationship among words,sentences and documents should be considered to obtain more specific documentlevel feature information.In this paper,targeted text feature representation methods are proposed for natural language processing tasks with different granularity.The research contents are as follows:(1)In this paper,we propose a new method of sentence level text feature representation using BiReGU and Capsule network,namely biregu-capsule model.In this model,BiReGU is the global feature extraction module.The global feature extraction module adopts the bilayer BiReGU model based on attention mechanism,and the captured global information is more comprehensive and specific.Capsule is a local feature extraction module,which adopts the Capsule network model based on attention mechanism.Firstly,multi-head attention calculation is carried out to reduce the influence of noise capsules.After global feature information is obtained,interactive fusion based on attention mechanism is added to pay attention to global information while extracting local feature information.BiReGUCapsule model was tested in the text classification task,and the classification results were better than the baseline model.In particular,the BiReGU-Capsule model proposed in this paper performed well on multi-class data sets,and the macro-F1 value increased by 2.6%compared with the traditional Capsule model.(2)This paper proposes a document level text feature representation method based on BigBird,namely BigBird Sum model.The model adopts the hierarchical feature representation,capture the document data information in different particle size,the relationship between the code and document coding layer using sentence two parts,the sentences using sparse coding layer attention BigBird model,reduces the time and space complexity model,at the same time increase the input text sequence length,full attention longer text context,Transformer model is adopted in the document coding layer to extract the relationship between sentences and documents based on the output of sentence features in the sentence coding layer,so that the whole model can pay attention to the relationship between words,sentences and documents at the same time.BigBird Sum model is experimented in the abstract extraction task,and the effect is better than the baseline abstract extraction model.Especially on the NYK50 dataset,the BigBird Sum model presented in this paper performed well,with a rouge-L score 3.47% and 2.50% higher than Transformer and Bert baseline models,respectively.
Keywords/Search Tags:Multi-grain, BiReGU, Capsule, Attention mechanisms, Big Bird, Transformer
PDF Full Text Request
Related items