Font Size: a A A

Research On Tibetan Word Segmentation And Part-of-speech Tagging Based On GNN

Posted on:2024-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y R WangFull Text:PDF
GTID:2555307079492864Subject:Electronic Information·Computer Technology (Professional Degree)
Abstract/Summary:PDF Full Text Request
Tibetan word segmentation and part-of-speech tagging,one of the fundamental tasks in Tibetan information processing,have a considerable impact on its more sophisticated applications,including Tibetan syntactic analysis,semantic analysis,intelligent question answering,and machine translation.Deep learning technology is currently being employed extensively in Tibetan word segmentation and part-ofspeech tagging tasks,and while it has shown positive results,it is still unable to resolve the issue of the ambiguity of the word segmentation and vague parts of speech categories in the tasks.In this study,the characteristics of Tibetan and the characteristics of Tibetan word segmentation and part-of-speech tagging tasks were combined to solve the problem of divergent meaning and part-of-speech duality in Tibetan more effectively.The problem of word segmentation and part-of-speech tagging is then transformed into a node classification problem based on the GNN-CRF model so that the model can also use the boundary information of words and the structure information of text.The main research included in this publication is as follows:(1)In order to better address the issue of the ambiguity of the word segmentation and vague parts of speech categories,this paper improves the graph structure created in the Tibetan word segmentation and part-of-speech tagging tasks as well as the model’s learning ability of multi-level information by adding global virtual nodes.(2)In order to enhance the graph neural network model’s aggregation approach and hence enhance the model effect,the multi-head attention mechanism is incorporated in this paper.According to experimental findings,the graph neural network model used in this paper performs word segmentation and part-of-speech labeling tasks more accurately than the Bi LSTM-CRF and IDCNN-CRF models.It also more effectively addresses Tibetan’s problems with ambiguous word segmentation and hazy parts of speech categories.(3)The tiny public data set for Tibetan word segmentation and part-of-speech labeling tasks,as well as the imbalanced graph data and sparse graph structure generated,are difficulties that this research addresses in terms of data augmentation.This method efficiently increases the amount of trained data,resolves the issues of imbalanced graph data and sparse graph structure,and enhances the model’s generalizability and the impact of node categorization.It does this by randomly adding and deleting edges and transposing graphs.The data augmentation technique suggested in this research,according to experiments,enhances the model’s performance in the word segmentation task.In summary,this study converts Tibetan data into graph structure data based on the characteristics of Tibetan,uses the graph neural network model to conduct research on Tibetan word segmentation and part-of-speech tagging,and implements a graph structure data enhancement method appropriate for Tibetan word segmentation and part-of-speech tagging,which offers a solution for Tibetan lexical analysis and data enhancement.
Keywords/Search Tags:Tibetan, word segmentation, part-of-speech tagging, graph neural networks, data augmentation
PDF Full Text Request
Related items