Font Size: a A A

Research Of Text Analysis And Data Annotation For Ncee Geography Question-answering

Posted on:2018-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:L R TangFull Text:PDF
GTID:2348330512997174Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Artificial intelligence is changing the world rapidly.In the field of natural lan-guage process,more and more researches on automatic question-answering have been carried out.Highly efficient and intelligent QA(Question Answering)systems aim to provide more direct and precise answers to users.They can retrieve information from large-scale knowledge base and make deductions automatically.Therefore,they free users from searching and filtering texts from the large quantity of information,as well as finally extracting the answers.In 2011,the QA robot Watson from IBM took part in a quiz show named Jeopardy!,beat the top human players and became the champion.Once again,QA system attracted the attention of the world.To some extent,the National College Entrance Examination(NCEE)is the most important test for the majority of Chinese high school students.which can be seemed as a high-level question-and-answer process.The background of this paper is a question answering system focused on the geography questions of the NCEE.And our work pays more attention to the multiple-choice questions.In the process of developing this QA system,we are faced with many challenges,unlike those we meet in traditional QA systems.Firstly,the style of questions is quite different.Secondly,the questions are much more flexible,which means we can hardly match the question with the original texts in the knowledge base directly.As the first step of automatic question-answering,question understanding plays a key role for the whole system,which is the key point of this paper as well.Our strategy is as follows:for each choice,we join it with the question to get a complete sentence,which will be used as a basic subject for analysis.There are two common types of comprehension for text:one is the discourse analysis between clauses,and another is deep semantic parsing for the sentences.Thus we work on improving the understanding of the sentences through the following two aspects:1.Classify the relationship of the parts separated by comma in long and complicated sentences;2.Using AMR to do deep semantic parsing.For splitting sentence,we propose a method of splitting by the commas in the choices,according to the feature of multiple-choice questions.Then possibly we may transform long original sentences into several semantically equivalent shorter sentences,thus promote the performance of the following processing stages.In this part,we put forward a two-stage method,using MaxEntropy classifier and a rule-based method in each stage respectively.First,we recognize whether the comma in the choice could be seen as a splitter.Second,find out the right border of the common prefix for the coordinate structure in the sentence.AMR(Abstract Meaning Representation)is powerful semantic representation method,which is newly proposed.It can represent the semantic of a sentence as a rooted,di-rected and connected graph.It focuses more on the abstract semantic in the sentence,in-stead of the superficial syntax style.However,the research on AMR has just started,so the state-of-the-art automatic AMR parsing algorithms are still not satisfactory enough.Chinese AMR corpus is relatively small at the moment.As far as we know,no research and application has applied AMR to Chinese corpus yet.Our work is based on an En-glish AMR parsing tool.In this paper,we modify this tool to process Chinese,and verify the performance of this algorithm on Chinese corpus.For geography questions,we get a small AMR annotation corpus and run this algorithm on it,too.In order to support the two parts of work mentioned above,we developed a anno-tation tool for the NCEE geography questions.With the help of this tool,we build a high-quality corpus on geography questions.In addition to the sentence splitting and AMR annotation,this tool also supports segmentation,part-of-speech,named entity,geography terms,question template representation and syntactical parsing annotation.
Keywords/Search Tags:Question Comprehension, Sentence Splitting, Semantic Parsing, AMR, Annotation Tool
PDF Full Text Request
Related items