Research On Chinese Sequence Labeling Algorithm Based On BER

Posted on:2024-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:S C Yu

Full Text:PDF

GTID:2568306917974009

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of Bidirectional Encoder Representations from Transformers(BERT),Chinese sequence labeling models based on BERT have attracted more and more attention from the academic and industrial communities.Many models have adopted BERT to enhance their labeling performance.This paper mainly focuses on how to use BERT to improve the performance of traditional Chinese sequence labeling models,including the following three subproblems.The first sub-problem is the fusion of characters and vocabulary learning.To address this issue,a BERT model based on cosine similarity adapter is proposed(Cosine-Similarity Adapter BERT,CSBERT).Through cosine similarity,different weights are assigned to relevant vocabulary for each character.This type of BERT can fuse or eliminate feature information of different vocabulary based on the value of cosine similarity.Experiments have shown that CSBERT outperforms other baseline models and this fusion method of character-word feature information has good reference value for other Chinese language processing tasks.The second sub-problem is that some entity boundaries in Chinese sequences may not be clear or ambiguous during representation learning.To address this issue,a boundary-aware vocabulary-enhanced BERT compression(C-BABERT)model is proposed.By introducing a boundary-aware mechanism,the training process of vocabulary-enhanced BERT is optimized with boundary-aware information.The experiment shows that compared to other baseline models,the C-BABERT has a certain performance improvement while saving some training time,especially showing better performance when facing scenarios with unclear entity boundaries.The final sub-problem is the complex relationships between characters or words in Chinese sequences.To address this issue,a vocabulary-enhanced BERT model based on GCN is proposed.By learning the local and global relationships of nodes on the graph through GCN,the complex relationships between characters or words in Chinese sequences can be explicitly captured.Experiments have shown that the LEBERT4 GCN model outperforms other baseline models.Especially when dealing with more complex Chinese sequence data,it is more robust.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Text Classification Of Rural Tourism Evaluation Based On Graph Convolution Neural Network And ERNIE-gram
2	Research On Text Classification Based On Improved Graph Convolution Neural Network
3	The Application And Research Of Semantic Graph Based On Text Similarity
4	Research On Similarity Related Problems In Collaborative Filtering Algorithms
5	Research And Application Of Natural Computation Method Based On Cosine Similarity
6	Graph Neural Networks For Personalized Recommendation
7	Research On Person-post Matching Model Based On Knowledge Graph And Bert
8	Research On Click Through Rate Prediction Based On Graph Neural Network
9	Research And Implementation Of Multi-event Prediction Method Based On Graph Neural Network
10	Research On Personalized Recommendation Of Social Networks Based On Graph Convolution