Font Size: a A A

Research On Auto Translation Of Large-scale Chinese Organization Name And Address

Posted on:2011-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiuFull Text:PDF
GTID:2178330338479964Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of economy, in order to let more and more foreign companies to understand domestic manufacturers'information and have business communication in time, the exchange of economic information especially organization code information, which is considered as―enterprise id‖, becomes very important. Making full use of the existing research of natural language to break the bottleneck of international exchange of the organization code is already become feasible.The main research content is the named entity translation technology such as Organization name and address. There are many studies for named entity translation in recent years, but whether general machine translation technology, simple transliteration technology or NE alignment techniques are all difficult to overcome their limitations.Our research objects have some particularity. They all are organization names and addresses not in long texts but registered in the national organization code management center, so they are all short records, satisfying certain regularity, having a general large amount of data, including the names from all walks of life and the complexity addresses, containing many unknown words, etc. Combining these characteristics, this paper presents a kind of Chinese Organization names'recognition and translation methods based on the template matching and a kind of Chinese Organization addresses'translation method based on the combination of the template matching and rules.In detail,this thesis is arranged as the following:1. By analyzing the structure of organization names ,we found their constitute regulations, then we using two parallel segmentation methods, which are the AP-based positive maximum matching segmentation and the PPOP-based reverse maximal matching segmentation .After that we merge the two segmentation results using method which based on the POS's assignment ,then eliminate ambiguity according to certain disambiguation rules, finally generate translation result by translating every segmentation nodes. 2. Based on the summary of the constitution of Organization addresses, we can divide one address into four kind location units, and the legal units and long ones will be first recognized and segmented, then we divide the rest segmentation using unit-based segmentation method, we can get the address'translation by combining the translation of every location units.3. Under the guidance of this research, we designed and realized the organization names and addresses'Chinese-English translation system.4. By analyzing the using process of the Knowledge Base, we developed some knowledge rules by which we can select optimal rules and avoid rules'conflicts.
Keywords/Search Tags:named entity translation, organization name translation, address translation, knowledge base Maintenance
PDF Full Text Request
Related items