Font Size: a A A

Design And Implementation Of A Large Corpus Of Multi-level Feature Index Retrieval Algorithm

Posted on:2005-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2208360125953739Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Artificial Intelligence (AI) is comparatively to human intelligence, machine translation is an important branch of AI, it is defined as converting expression of one natural language into another natural language. In addition to the application in the translation field, what's more important is that, it is the key technique that make machine understands the natural language.Example-based Machine Translation (EBMT) uses the theory of CBR of the AI. The reasons of that EBMT populated include: the knowledge translation needs needn't be abstracted and is acquired easily, the translation have high quality and better readability, the machine studies more easy. But the EBMT also have some problem desired to be solved, they include: the simple method of calculate the similarity between instances, the corpus should cover comprehensive language phenomena, how to organize and store the large-scale corpus, how to build index, how to reduce the search space and the speed of constringency.The paper advances a design pattern of large-scale corpus and its pattern matching algorithm. The corpus which have a hierarchical feature retrieved index and a optimization algorithm, not only resolves the difficulty of organization and store of traditional large-scale corpus, but also convenience later operation include insert, delete and update. The pattern matching algorithm based on the index, uses the similarity of words, length and entropy as the conditions, effectively avoid the defect of the traditional algorithm using single strategy, and have a distinct enhance of efficiency and veracity.Through the realization of the system, after using the hierarchical feature retrieved index and proper setting of numerical value, good translation can be obtained. And along with the automatic study and continually increase of corpus, the participation of user of the system decreases, the amount and the difficulty of post edit decreases correspondingly, thus improves some certain intellectualized level of machine.
Keywords/Search Tags:Example-Based Machine Translation, large-scale corpus, CBR, AI
PDF Full Text Request
Related items