Research On Chinese Abstract Semantic Representation System, Resource Construction And Application

Posted on:2021-03-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R B Dai

Full Text:PDF

GTID:1365330647453193

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

As the most significant and challenging part of natural language processing,semantic representation has always been a hot issue in academic research.With the general trend of the transformation of linguistic computing from the syntax to the semantic level,the existing language resources develops to varying levels in terms of concept semantics,frame semantics,and situation semantics.Also,multi-level and multi-type semantic resources are integrated to construct a deep semantic representation of language knowledge base has become one of the problems to be urgently solved at this stage.At present,from the field of linguistics theory or automatic parsing,the research on semantic representation has gradually moved from syntax to semantics.The representation method of the syntactic structure has also experienced from the tree structure to the non-project tree structure,then to the initial attempt and application of the graph structure,which can be summarized as the development process from tree to graph.AMR(Abstract Meaning Representation,AMR)as a new semantic representation method can abstract the semantics of a sentence into a single-root directed acyclic graph.The abstract meaning representation approach combines syntax and semantic information to express semantics with the graph structure,revealing the argument sharing phenomenon that cannot be represented by the tree structure,and giving a clearer expression of sentence semantics.However,the lack of alignment information between the words in a sentence and the concpets of an AMR graph affects the consequence of automatic parsing and the quality of the corpus annotation to a certain extent.Meanwhile,there is no large-scale AMR corpus in Chinese.With drawing on the principle of AMR semantic representation in English and combination of the characteristics of Chinese,integrating the concept-to-word alignment information,this dissertation proposes a set of syntactic and semantic integration methods for Chinese,which can be corresponding to the Concept-to-word Alignment Chinese Abstract Meaning Representation(CACAMR)system.The specific contents include the use of graph structure to deal with argument sharing,intergrating concept-to-word alignment information to improve semantic representation ability,and the representation of Chinese special structures and compound sentence relations specified in the CA-CAMR annotation system.By comparing the semantic representation of AMR between English and Chinese,we summarize the inheritance and development of CA-CAMR on AMR to demonstrate the advantages of the CA-CAMR representation system proposed in this paper for describing Chinese semantics,and establish the values of the integration of concept-to-word alignment information for linguistic research and automatic parsing algorithm design.The construction of CA-CAMR system lays the foundation for the further development of a Chinese abstract meaning representation corpus with concept-to-word alignment.Base on the above,the CA-CAMR corpus is constructed in this work.Under the guidance of the CA-CAMR specification,a human-machine corpus annotation method is applied to build the CA-CAMR corpus via the CAMR Anno Kit platform.The CA-CAMR corpus presently includes 20149 sentences corpora from the Little Prince,the Chinese Treebank of CTB8.0(the Penn Chinese Treebank,CTB)on the online media corpus,and PEP Chinese textbooks for elementary school.This thesis shows the corpus annotation in detail,offers a resolution strategy for the inconsistencies in the test corpus,and conduct a systematic statistics and analysis of the corpus data,including the graph structure,argument sharing,and the annotation of special syntactic structures in Chinese.The statistical results suggest that the CA-CAMR corpus has achieved a certain scale with advantages in deep semantic representation and special syntactic structure,and can realize the integration of syntax and semantic that can provide corpus resource support for related research.Finally,the application value of constructing CA-CAMR system and corpus in two aspects of language ontology and natural language processing is explored.Ellipsis is a common language phenomenon in Chinese,and in traditional syntax and semantic representation,language structures with ellipsis information are often ignored.This dissertation employs the CA-CAMR corpus to investigate the distribution of Chinese semantic ellipsis structures in large-scale real texts,and roughly describes the overview of Chinese ellipsis.Next,the "de" construction with semantic ellipsis accounting for the highest proportion(47.3%)is taken as a research object to establish a set of experimental schemes for automatic recognition and automatic head completion of the "de" construction with semantic ellipsis.The experimental results indicate that this approach can effectively identify and complete the "de" construction with semantic ellipsis in the CA-CAMR corpus,and validate the research value of the CA-CAMR system and corpus on the representation of deep semantic relations in Chinese.

Keywords/Search Tags:

semantic representation, concept-to-word alignment, corpus annotation, graph structure, semantic ellipsis, "de" construction

PDF Full Text Request

Related items

1	CAMR Semantic Library Construction And Statistical Analysis Based On Conceptual Relationship Alignment
2	Research On Ellipsis Based On Abstract Semantic Representation
3	Research On Deep Semantic Annotated Corpus Of Modern Chinese
4	Annotation And Measurement Of Multi-round Dialogue And Construction Based On Chinese Abstract Semantic Representation
5	Study On Semantic Selation Of The Modern Chinese Serial Verb Structure Based On The Corpus
6	Research On The "Vocabulary - Syntactic Semantics" Interface Of Hand Movements Based On Annotation Corpus
7	A Comparative Study On The Effects Of The Semantic Annotation Of Dunhuang Mural Digital Images From The Perspective Of Users
8	Prediction Of Children's Abstract Concept Semantic Distance Using Word Co-occurrence Model Based On Children's Corpu
9	A Study On Chinese Mongolian Word Alignment And The Related Technologies
10	Construction Of Formal Semantic Conceptual Representation Of Primary English Modal Verbs