Font Size: a A A

Research On Key Technology In Classical Chinese Translation And Reading Comprehension

Posted on:2016-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2295330479490038Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the long history of china, there are numerous bo oks written in classical chinese. And statistical machine translation have obtained great development in recent years. Many open source tools like Moses can train a translation system with bilingua l language parrel corpus. At the same time, the development of other natural language processing technics inspires us to solve the artifit ial intellegence problem in our real life. O ur purpose is to explore the key technology of classical chiniese translation and reading comprehension.To deal with the task, we conduct our research on the following aspects.Firstly, we construct a classical- modern chinese parallel corpus. In our paper, we utilize the paralle l classical- modern chinese parallel websites and webpages to build the paralle l corpus. The parallel corpora acquisition is divided into two steps, the first step is to obtain the main content of the paralle l webpages, the second step is to align the sentences. O ur method is an improved edition of the DOM based text density method, we propose puncutation density to replace text density. During the experiment in obtaining the main content of paralle webpages, compared with the contrast method, the result of our method is improved for some degree. As for sentence alignment, we introduce the relation of sentence length, match pattern and cognateness to log- linear model to estimate the score of source and target sentences. Using different framework, we define 10 cognateness. The result of our model is far better than that of the length-based method.Secondly, we make our study on the optimization of classical- modern C hinese translation system based on Moses. We try to utilize Moses to optimize the translation system w ith the classical- modern C hinese paralle l sentences. Language model and translation model are the two phases we focus on. For language model, we consider different corpus and smooth methods and the hybrid of model. For translation model, we consider the impact of word segments. O ur method improves the perfo rmance of translation system significantly.Thirdly, we explore the answering technology of classical C hinese reading comprehension. We make our research on the chosen thre e kinds of questions. We define the similarity as the accuracy of the choice. And then, we determine the a n-swer by the question and similarity. We define 24 similarities, which are based on bag of word, longest common subsequence, edit distance, cosine similarity and N-gram, for translation identifying problem and summarizing- analyzing problem. As for word sense identifying problem, we define 7 similarities. They are derived from bag of word, translation phrase table and word simila rities. Using our method, the accuracy of three kinds of problems is not bad. At the same time, we acquire 8 features from the similarit ies for word sense identifying problem. In the 3- fold crossing validation of svm-rank, we obtain a higher accuracy.
Keywords/Search Tags:content extraction, sentence aligning, classical chinese translation, reading comprehension
PDF Full Text Request
Related items