Font Size: a A A

Research On The Recognition And Translation Of Japanese-Chinese Numerical And Temporal Expressions

Posted on:2018-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:J H GuoFull Text:PDF
GTID:2348330512993285Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The recognition and translation of Named Entity(NE)are basic and essential assignments in Natural Language Processing(NLP).Numerical-Temporal Expression(NTE),as a special type of named entity,contains critical information,and the recognition and translation of NTE have important theoretical significance and practical value.The recognition and analysis of NTE are important foundations for NLP tasks,such as information retrieval,event extraction,event detecting and tracking,and query answering system.Especially in the multilingual processing tasks like machine translation,the alignment and the translation of NTE are important factors that affect the performance of the machine translation system.The studies of recognition and translation of NTE have great significance in improving the performance of machine translation system and promoting the rapid development of artificial intelligence.Based on the characteristics of Japanese-Chinese bilingual NTE,this paper combines linguistic with statistical methods.Through extensive data analysis and experiments,we conduct deep research on the recognition and translation methods of bilingual NTE,and apply them to machine translation system.The main components of this paper are as follows:(?)Based on latest TIMEX3 time annotation specification and general number classification methods,considering the homogeneity and heterogeneity of Japanese-Chinese linguistic knowledge,we build keyword repositories for Japanese and Chinese NTE respectively,including trigger words and boundary words.We involve words meaning "approximate number" into the range of NTE recognition so that the NTE has more abundant meaning.Then regular matching is employed to recognize NTE.Finally,the above rule-based and statistics-based methods are combined to recognize Japanese and Chinese NTE respectively.The experimental results show that this recognition method has good performance in Japanese and Chinese.(?)Bilingual alignment of NTE is integrated into traditional word alignment,which brings out a bi-directional alignment method of NTE based on position constraint and similarity measure.According to the experimental results,this method can effectively improve the performance of bilingual word alignment,and optimize translation models with machine translation system.(?)According to the translation characteristics of Japanese-Chinese NTE,a NTE translation rule repository is established,which is dedicated to the independent translation of NTE.The recognition and alignment information of bilingual NTE and translation rule repository are effectively integrated into the existing statistical machine translation system.The experimental results show that the integrated system can improve the accuracy of translation of NTE and proximity words,hence enhance the overall translation effect.To sum up,the innovative works in this paper are mainly reflected in:according to the characteristics of Japanese-Chinese NTE,designing the recognition and translation rules of temporal words based on the TIMEX3 specification;embedding the words meaning "approximate number" into the range of NTE recognition;proposing a type of NTE bi-directional alignment method based on position constraint and similarity measure;as well as establishing translation rule repository of Japanese-Chinese NTE.Finally,these three aspects of research are applied to machine translation system and the experimental results verify that these methods can effectively improve the overall performance of the machine translation system.
Keywords/Search Tags:Named Entity, Temporal Expression, Numerical Expression, Rule, Alignment, Machine Translation
PDF Full Text Request
Related items