Font Size: a A A

Encoding Background Knowledge Into Discourse Analysis

Posted on:2017-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y ZhangFull Text:PDF
GTID:1108330503969767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Discourse analysis aims to capture discourse-level semantic information, including discourse structure analysis and content analysis, which draws a lot of attentions recently. Existing research on this topic mainly focuses on how to extract information from source docuemnt. However, according to the research in congnitive psychology, source document could not be understood correctly without relevant knowledge. Lacking of background knowledge will limit the understanding to the source document. For better discourse analysis, encoding background knowledge is necessary. In this work, we first focus on finding background knowledge for a given document by proposing a search engine based model, and then a distributional semantics based model. Secondly, we incorporate the background knowledge into source document analysis, including discourse structure analysis and content analysis. For discourse structure analysis, we explore the task of discourse relation recognition to capture corresponding information. For content analysis, we explore the task of coherence evaluation. Our research is followed below.1. Search engine based background knowledge ranking in discourse levelWe propose the search engine based background knowledge ranking model, which uses the triple of ”Subject, Predicate, Object” as the basic unit of knowledge. We get the knowledge from existing knowledge base like YAGO and automatically extracted knowledge base. We propose the triple graph based document representation and incorporate source document information and background knowledge. Then the search engine is introduced to a weight propagation based model so as to evaluate the relevance between background knowledge and source document. For evaluation, we treat our task as a ranking problem and annotate the results manually. The experiments show that we get a MAP value of 0.676 and a P@20 value of 0.417, which is a rather good performance.2. Distributional semantic based background knowledge ranking in discourse levelConsidering the low efficiency of model and evaluation, we further propose the distributional semantic based model for background ranking. We introduce topic model and deep learning to convert the triple into real-valued vector and compute the relevance between triples by cos-similarity. Then we incorporate an improved weight propagation based model to rank the background knowledge for document classification task as a taskbased evaluation. The experiments show that our model gets a MAP value of 0.649 and a P@5 value of 0.5596 in the setup of ranking, and improves the performance of document classification by 2.55%.3. Chinese discourse relation analysis with background knowledgeWe then encoding background knowledge into discourse structure analysis, which is an important part of discourse analysis, by exploring the task of discourse relation recognition. Considering that there is no widely agreed task definition of discourse relation analysis in Chinese, we first propose the frame of task together with the sense hierarchy,and then construct a large-scale Chinese discourse relation corpus. We introduce a threelevel annotation process and annotate 1096 documents and more than 20,000 instances.After that, we incorporate the background knowledge into the recognition of discourse relation. These relations are then introduced into sentiment analysis task and improve the performance of corresponding model.4. Local coherence evaluation with background knowledgeIn addition to discourse relation recognition, we further encoding background knowledge into content analysis, which is another important part of discourse analysis, by exploring the task of coherence evaluation. Given a document to be analyzed, we use the model introduced before to get the background knowledge first, and then incorporate it into graph-based unsupervised model and entity grid based supervised model. We test our model in the tasks of sentence ordering and summary coherence rating. The experiments shows that, by incorporating background knowledge, we improve the performance of corresponding model significantly, which proves the benifit of background knowledge.In summary, this work explores introducing background knowledge and incorporating knowledge into discourse analysis. We hope that this work will inspire researchers working in related topics. For better comparison, part of this work is carried out over English corpus, but our method is language independent and applicable to any language.We believe that, with the improvement of natural language processing, the technologies of background knowledge ranking and document analysis will keep improving and then benifit ralated research like machine translation, automatic question answering, sentiment analysis, natural language generation, and automatic summarization.
Keywords/Search Tags:background knowledge, association, discourse analysis, document classification, coherence evaluation
PDF Full Text Request
Related items