Font Size: a A A

Research On Term Extraction Method Based On Comparable Corpora

Posted on:2016-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:G D ShiFull Text:PDF
GTID:2308330473462641Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Term extraction technology is critical for text information processing, which is the basic work in Natural Language Processing domain. Term extraction discussed in this paper includes common bilingual words extraction. In general, there are two main methods for term extraction, i.e. term extraction methods based on third-intermediate language and bilingual lexicons, term extraction methods based on Bilingual Corpora. The first kind of method is only from the perspective of words, not considering the correlation between words. However, the second one takes use of the correlation between words. This paper mainly studies two kinds of term extraction methods, and combines two kinds of methods to improve the quality of the lexicon.This dissertation includes:(1) Investigation of traditional term extraction methods based on comparable corpora. At this aspect, a bi-directional context-based vector model is proposed. Traditional term extraction methods based on comparable corpora are studied focusing on selecting comparable corpora, building a bilingual term extraction model and similarity measure. A bi-directional model is proposed based on traditional single-directional context-based model (2) Building up a Chinese-Swedish comparable corpus. Traditional methods are used to building up a Chinese-Swedish comparable corpus which uses finance datas and is used for the following research work (3) A new term extraction method is proposed, which is based on third-intermediate language and on comparable corpora. The proposed method makes use of word co-occurrence to optimize the lexicon and get better results. Chinese and Swedish respectively are source language and target language, where English is used as a bridge. Compared with prior arts, the proposed method works better, which shows advantage of the method.
Keywords/Search Tags:term extraction, comparable corpora, bilingual lexicon, third-intermediate language
PDF Full Text Request
Related items