Font Size: a A A

Technique Of Recoginizing And Translating Chinese & English Time And Numeral And Quantifier

Posted on:2012-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhengFull Text:PDF
GTID:2218330362950465Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entity recognition and translation is an important part of a machine translation system. This paper uses a method based on corpus to mine the expression patterns of numerals, dates and times, quantifiers and then transfer those patterns to language rules. Thus, the recognition and bidirectional translation of numerals, date and time, quantifiers in Chinese and English are completed. The main content of this paper consists of four parts:(1) At first, this thesis introduces the automata theory briefly so as to prepare the theoretical basis for abstracting and utilizing the rules. And the equivalence relation of Deterministic Finite Automata, Nondeterministic Finite Automata and Regular Expression is also illustrated. We firstly mine the expression patterns of named entities in Chinese and English then transfer those patterns to language rules, i.e. regular expressions.(2) This paper adopts synchronous context free grammar (SCFG) to parse and translate date and time. Synchronous context free grammar is based on context free grammar while the former added translating rules, thereby parsing and translating being synchronized. As to the paring method of SCFG, we use CYK+ algorithm.(3) Numeral-classifier compound is defined as"numeral + quantifier + noun"in this thesis. The numeral part directly adopts the former numeral recognition and translation system as a module. The quantifiers are using the material summarized by linguists. As for the noun part, we use a strategy of data mining, which means extracting translation pairs of Chinese and English numeral-classifier compound from the phrase-table of large corpus.(4) This thesis gives detailed explanation of recognition and translation criterion of numerals, date and time and numeral-classifier compounds in Chinese and English. Experiments have showed its good recognizing F-measure value and translating accuracy rate.
Keywords/Search Tags:named entity recognition, named entity translation, synchronous context free grammar, regular language
PDF Full Text Request
Related items