Research On Tibetan Word Segmentation Technique Of BiLSTM＿CRF Integrating Attention

Posted on:2024-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:F Y Chang

Full Text:PDF

GTID:2558307067468354

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The Internet’s increasing popularity has caused a swift rise in the quantity of text information.It is necessary to perform natural language processing tasks such as proofreading,classification,translation,and information retrieval more accurately and efficiently.Word is the smallest unit to express semantic information,so the word sequence obtained by segmentation has semantic information.In order to facilitate the computer to understand human language,we need to segment the text to extract the important content better.Hence,word segmentation is an essential work of natural language processing.In this thesis,the Tibetan word segmentation technology is studied in terms of the design of the word position tag set and the design and implementation of the word segmentation model.The main contents include:(1)Proposed eight-word-tags labeling methodIn order to learn more semantic information and improve the performance of word separation,an eight-word-tags set is obtained by expanding the four-word-tags set and formulating the eight-word tagging set annotation rules.Tibetan word segmentation based on word-position tagging needs many experimental corpora.Manual annotation of the corpus is more inefficient and error-prone.Therefore,an eight-word-tag set labeling algorithm is designed.(2)Designing a BiLSTM＿CRF Tibetan word separation model incorporating AttentionAlthough BiLSTM＿CRF,which fuses BiLSTM and CRF,can automatically obtain contextual information and consider the relationship between output tags,BiLSTM＿CRF cannot highlight locally focused information and loses extended sequence history features.To address the problems,this thesis proposes a BiLSTM＿CRF Tibetan text separation model,Attention＿BiLSTM＿CRF,based on BiLSTM＿CRF,which incorporates Attention to first obtain global contextual information through BiLSTM,then use Attention to enhance local focus information and mitigate the impact of history information loss.Finally uses CRF to learn the relationship between tags to circumvent illegitimate tags.(3)Experimental verification of the validity of the eight-word-tags labeling method and the Attention＿BiLSTM＿CRF Tibetan word separation modelIn order to verify the effectiveness of the eight-word tagging method and the Attention＿BiLSTM＿CRF Tibetan word separation model,experiments were conducted using the CRF,BiLSTM,BiLSTM＿CRF,and Attention＿BiLSTM＿CRF Tibetan word separation models under the four-word-tags set,the six-word-tags set and the eightword-tags set respectively.The experiments show that the best performance of the Attention＿BiLSTM＿CRF model is achieved when the Attention type is Spare Self,embedded after forward LSTM and reverse LSTM.

Keywords/Search Tags:

Natural language processing, Tibetan word segmentation, Attention, BiLSTM, CRF

PDF Full Text Request

Related items

1	Research On Chinese Word Segmentation Methods Based On Deep Learning
2	Research On Tibetan Word Vector Representation Technology Based On Deep Learning
3	Research On Tibetan Word Representation Techniques
4	The Key Technologies Of Representation Of Tibetan Word Vector
5	Research On Attention Mechanism For Natural Language Processing
6	The Methodology And Implementation Of Chinese Natural Language Query In Databases
7	Research On Dependency Parsing Of Tibetan Language Based On Deep Learning
8	Chinese Word Segmentation Model Based On Improved Bidirectional LSTM-CRF
9	Research On E-Commerce Commodity Title Category Classification Algorithm Based On Natural Language Processing Technology
10	Research On Multi-modal Word Segmentation Method Integrating Speech Features