Research On Syntactic Analysis Based On Tibetan Dependency Tree Enhancement

Posted on:2024-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:X C R Xiang

Full Text:PDF

GTID:2555307085970709

Subject:Chinese Ethnic Language and Literature

Abstract/Summary:

PDF Full Text Request

Syntax analysis plays a very important role in the field of natural language processing,where dependency parsing is a natural language processing technology mainly used to analyze the dependency relationships between various words in a sentence.Dependency parsing is the creation of a graphical structure to represent the dependencies between words.In this structure,each word is represented as a node,and the dependencies between them are represented by edges.For example,in the sentence,"(?)","(?)" is the subject,"(?)" is a predicate,"(?)" is the object.As a result,"(?)" and "(?)" there is an edge between,said the dependencies between subject and predicate,"(?)" and "(?)",there is also an edge between said dependency relationship between the predicate and object.Dependency parsing is widely used in natural language processing fields such as machine translation,information retrieval,text classification,named entity recognition,and so on.For example,in the field of machine translation,dependency parsing can help machine translation systems better understand the grammatical structure of source language sentences,thereby improving translation quality.In the field of information retrieval,dependency parsing can be used for in-depth analysis of query statements to more accurately match related documents.In the field of text classification and named entity recognition,dependency parsing can help algorithms better understand the context information of text,thereby improving the accuracy of classification and recognition.Common tools and methods for dependency parsing include Stanford Parser,Berkeley Parser,Malt Parser,and others.These tools are based on different algorithms and models,and can be trained and tested using different languages,corpora,and features.Among them,Stanford Parser is one of the most popular dependency parsing tools,which uses a dependency parsing model based on neural networks and supports multinational language analysis.There are also some common techniques and methods in the application of dependency parsing.For example,by pruning and filtering dependency parsing results,redundant dependencies can be removed,thereby improving the clarity and readability of the parsing results.In addition,techniques such as tagging sequence prediction can also be used for dependency parsing,which can take into account both the local and global structure of a sentence,thereby more accurately analyzing the dependency relationships of a sentence.Summary dependency parsing is an important natural language processing technology that can help machines better understand the grammatical structure of natural language texts.Tibetan dependency parsing requires selecting appropriate tools and methods,and fully considering issues such as data volume,feature selection,and algorithm optimization to achieve better analysis results.The main research work and contributions of this article are summarized as follows:1.Tibetan word segmentation and part of speech taggingSyntax analysis plays an important role as a connecting link in the field of natural language processing.Because there is no obvious separator between words in the collected Tibetan corpus,not only word segmentation tasks but also part of speech tagging tasks are required before labeling the dependency tree database.This article proposes ELMo and Transformer＿Encoder hybrid model for Tibetan part of speech tagging.ELMo is a bidirectional pre training model that can extract Tibetan word features more deeply.Combining pre trained word embedding with Transformer’s self attention mechanism to extract semantic features of Tibetan sentences to predict the part of speech markers of each word.This method can achieve an accuracy of about 97%on the dataset used in this article.2.Establishment of Tibetan DependencyEstablishing Tibetan dependency relationships is a key issue in the task of building a Tibetan dependency tree base.In the existing research results,there is no unified dependency relationship.Currently,Tibetan dependency relationships generally include 15,24,25,33,and 36 different types of dependency relationships,resulting in many difficulties in labeling data.Therefore,based on previous research work,we have reestablished 32 Tibetan dependency relationships and provided theoretical basis one by one.3.Formulation of labeling specificationsThis article refers to the construction of a Tibetan dependency tree base based on the Universal Dependencies tree base format.The construction of a Tibetan dependency tree base includes the establishment of Tibetan dependency granularity,the establishment of dependency relationships,and the enhancement of dependency tree base labeling principles.By comprehensively and systematically describing the construction steps of the Tibetan dependency syntax tree base,the Tibetan dependency syntax labeling system,and labeling specifications,the selected corpus is syntactically labeled.Currently,12000 sentences have been labeled.4.Research on Tibetan Dependency Syntax AnalysisReference a simple graph based dependency parser to the constructed Tibetan dependency tree library using neural attention and dual affine classifiers to predict arcs and tags.The parser has achieved the most advanced or nearly advanced performance on the Tibetan UD format standard tree library,achieving 89.25% UAS and 88.30% LAS on this dataset.

Keywords/Search Tags:

Tibetan word segmentation, Part of speech tagging, Tibetan Dependency Tree Library, dependency parsing

PDF Full Text Request

Related items

1	Tibetan Segmentation And POS Tagging Study
2	Research On Tibetan Word Segmentation And Part-of-speech Tagging Based On Pre-trained Language Models
3	Research On Tibetan Word Segmentation And Part-of-speech Tagging Based On GNN
4	Research And Implementation Of The Tibetan Part Of Speech Tagging System
5	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
6	Research On Word Segmentation And Part-of-speech Of Tibetan On Neural Network
7	Syntactic Analysis Of Tibetan Short Sentences Based On Case Grammar
8	Research On Thai Word Segmentation And Part-of-speech Tagging Based On Multi-granularity Feature
9	Research On Automatic Notation Of Word For Tibetan Corpus Based On HMM
10	Research On Automatic Notation Of Word For Tibetan Corpus Based On Hmm