Multilingual Dependency Parsing

Posted on:2024-07-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D Ji

Full Text:PDF

GTID:1528307070960089

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Dependency parsing is one of the core tasks of natural language understanding,supporting research in cross-disciplinary disciplines such as cognitive science and com-putational linguistics.As a longstanding,formally concise,and multilingual consis-tency syntax,dependency parsing helps humans or artificial intelligence to better un-derstand syntax constituents such as“subject,predicate,object,determiner,adverbial,complement”by parsing the dependency relations between words in a sentence,thus achieving more accurate and intelligent natural language understanding.With the con-tinuous development of globalization and the strengthening of human social connec-tions,there is a growing demand for multilingual text processing.Therefore,due to its rich research and application value,multilingual dependency parsing has become one of the NLP hotspots in recent years.Due to the imbalance in language research,resources such as dependency tree-banks,linguistic typology,and natural language text form a“pyramid”structure.Lan-guages are divided into three categories according to the richness of resources:high-resource(with large-scale treebanks for supervised training,e.g.,Chinese),low-resource(without treebanks but with linguistic typology research,e.g.,Afrikaans),and zero-resource(only with unlabeled natural language text,e.g.,Bambara).Traditional meth-ods either focus on the performance of high-resource languages,or prioritize the us-ability of low-resource and zero-resource languages.This seriously undermines the overall performance of multilingual dependency parsing,which not only discourages downstream tasks from using dependency trees but also inconsistent with the idea of a human community with a shared future.To address these shortcomings,we improve both overall performance and multilingual usability of dependency parsing.We pro-pose solutions to the research challenges in these three scenarios.1.For high-resource languages,We focus on how novel neural networks and ar-chitecture designs can improve the learning effectiveness of treebanks.To the prob-lem of manually designing high-order features in graph-based methods,we novelly employ graph neural networks,which can efficiently encode full high-order infor-mation,thereby significantly improving the performance with a computational cost of only 1%.To the challenge of encoding a dynamic and complex transition system in transition-based methods,we propose a structural indicator network,which can ex-tract complete features for the first time.Although the accuracy of the transition-based method is slightly lower than that of the graph-based method,its_O(n)parsing com-plexity is faster than the_O(n~3)of the graph-based method.Each of the two methods has its own application scenarios.2.For low-resource languages,although they lack time-consuming and costly de-pendency treebanks,they have some typological features identified by linguists,such as subject-verb-object order,adjective-noun order,etc.We propose a typology-guided Transformer position encoding that achieves better performance for both low-and high-resource languages by more accurately utilizing typological information.3.For zero-resource languages,we assume only unlabeled text is available and propose a Transformer encoder with a bag-of-word input module and a reordering module to restore the word order.For zero-resource texts that the parser has not seen before,the bag-of-word input module first removes the unfamiliar word order and then the reordering module automatically restores the most familiar word order for the parser.Our parser not only has the strongest multilingual applicability,but also significantly improves the performance of zero-resource languages,thus filling the gap in research on zero-resource.4.For downstream applications of dependency parsing,to address the lack of user-friendly multilingual parsers,we construct a multilingual dependency parser-“Ant Parser”.It consists of two basic architectures(graph-based or transition-based)suitable for different speed requirements and two multilingual enhancement schemes suitable for different resource requirements.This system is user-friendly and contribut-ing to the community development.To address the lack of integration with general in-telligent language models(such as GPT),we propose a syntax-controlled text genera-tion model that integrates the transition-based method with the autoregressive language model GPT.Our method not only improves the fluency and grammatical correctness of the GPT,but also allows the GPT to be controlled to generate sentences with specific dependency syntax.It opens up new applications of dependency parsing.

Keywords/Search Tags:

Multilingual Dependency Parsing, Natural Language Understanding, Syntax Parsing System, Graph-based Parsing, Transition-based Parsing

PDF Full Text Request

Related items

1	Exploiting Dependency Parsing As An Auxiliary Task To Enhance AMR Parsing
2	Research On Natural Language Syntactic Parsing Based On Deep Learning
3	Research And Implement On Chinese Dependency Parsing
4	Chinese Semantic Dependency Graph Parsing And Application
5	Research On Technology Of Chinese Dependency Parsing
6	Semantically-enriched parsing for natural language understanding
7	Research On Syntax Parsing Using Neural Networks
8	Models for improved tractability and accuracy in dependency parsing
9	Research On Mongolian Dependency Parsing Based On The Conversion Of Chinese-Mongolian Dependency Parsing Tree
10	The Study On Data Augmentation In Chinese Parsing