Font Size: a A A

Design And Realization Of Connective Identification And Comparison System For Both English And Chinese Languages

Posted on:2019-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:D F ZhuFull Text:PDF
GTID:2428330578979122Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
Discourse analysis is one of the fundamental research directions of natural language processing.Its main task is to analyze the semantic relations between the various structures and their component units of the given text from the overall level,and to understand a text by using context.Discourse analysis is very important to natural language processing applications,such as machine translation,question answering system,automatic summarization,text generation,and so on.Determine the relationship between two discourse units is the core component of discourse analysis.Discourse relations can be classified into two categories:explicit and implicit.Connectives can directly express the semantic and structural features of the given discourse units,thus play an important role in the analysis of text structure.This paper focuses on the identification of discourse connectives for both Chinese and English languages.The specific research work includes,1)Construct a unified connective identification platform for both Chinese and English languages based on conditional random field(CRF)model.Cast connective identification as a sequential tagging task,using English PDTB corpus and Chinese CDTB corpus,a connective identification platform for both Chinese and English languages is constructed using CRF.Various experiments are conducted to show the effects of different tagging sets and feature templates.2)Propose a bilingual approach to Chinese connective identification.By analyzing the differences between Chinese and English connectives,design and realize a method incorporating bilingual information to address the problem of diverse expressions of Chinese connectives and to improve the performance of Chinese connective identification.Various comparison experiments are conducted to show the effects of our proposed approach.3)Build a Chinese connective identification platform using BiLSTM+Self Attention+CRF framework.In order to reduce the dependence on artificial feature extraction,BiLSTM is used to encode the context sequence,correspondingly,CRF is used to decode it.In this way,a complete Chinese connective identification platform is constructed.Various experiments are conducted to show the effects of our approach.4)Design and realize a comparison system for both English and Chinese Languages.This system can be used to compare and analyze the results of connective identification for the same language using different models,and also can be used to compare and analyze the bilingual recognition results.On the one hand,a unified Chinese-English connective identification framework is built,and the performance of Chinese connective identification is improved by various methods.On the other hand,the comparison system can be used for the performance analysis of various models and the analysis of the differences between Chinese and English connective identification.So it can help the improvement of Chinese connective identification from bilingual perspective furtherly.
Keywords/Search Tags:Connective Identification, Linear Tagging, Self-attention mechanism, Comparison System
PDF Full Text Request
Related items