Font Size: a A A

Research On Intelligent Cross-linguistic Agricultural Intellectual Property Retrieval Model And Algorithms

Posted on:2015-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:L WeiFull Text:PDF
GTID:2298330434964987Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the world nowadays, the Internet message is expanding rapidly and retrievaltechnology of agricultural intellectual property has been fully developed. With the increasingneeds for agricultural intellectual property retrieve, although Chinese agricultural intellectualproperty information can be retrieved efficiently by single-language, it is very necessary forus to search related information in English. So we put our eyes on the field of cross-languageagricultural intellectual property information retrieval. For the problem of poor efficiency tocross-language agricultural intellectual property retrieval, phrase statistical machinetranslation and latent semantic cross-language query expansion have been used as the studyobject in this dissertation and the following main tasks have been completed:(1) Construction of phrase-based statistical machine translation model. For the issue oflow accuracy to common machine translators while translating information on agriculturalintellectual property, phrase-based statistical machine translation model which includeprofessional agricultural corpus has been used to improve it. First of all, we can obtain theagriculture bilingual corpus through information extraction and do model training to thecorpus through corpus pretreatment, word alignment and phrase extraction. Then, decodingthe testing sentence through isi-decoder. Next, Subsequent processing need to be done for theoutput of decoding to obtain the final translation results. Finally, evaluating the translationresults through the criteria of BLEU and NIST and outputting the document. The documentof evaluation results has proved that translation quality of agricultural intellectualproperty-related literature has been improved16.9%after adding the agricultural corpus intophrase-based statistical machine translation model.(2) Study on optimization of cross-language query expansion based on Latent Semanticanalysis. For the problem of lower precision rate that caused by the reason the query wordsdoesn’t match well with the object documents which insists on cross-language informationretrieval, cross-language query expansion method has been introduced to improve it.Predecessors adopt singular value decomposition method while establishing a bilingual spacewhich may lead to the results that the value of this method is negative when construct a space matrix and the corpus representation is affected greatly. So we improve it by addingnon-negative matrix factorization method. After establishing a bilingual space matrix, inorder to reduce the computation the number of dimensions must be reduced. In fact,researchers usually set the value of this dimension directly. However, if this value is too large,the computation can’t be reduced and if too small, the losing semantic information may leadto meaningless. We can achieve the purpose of dimensionality reduction through the methodsof studying the establishment of a merit-based model, establishing a number of dimensionvalues and selecting the most optimal dimension value through a akaike information criterion.When do cross-language extensions, it needs to improve the degree of text clusteringpolymerization and usually k-means clustering algorithm is used to improve it. However, thismethod can affect greatly the accuracy of polymerization. Therefore, k-medoid clusteringmethod is chosen. Experiments show that retrieval accuracy has improved by9.8%afteradding non-negative matrix factorization method, retrieval accuracy has improved by18.2%after adding the merit-based model and retrieval accuracy has improved by3.8%afterimproving k-means clustering method. Summarizing above all, the retrieval accuracy hasarrived at61.28%and increased by40.6%.(3) Combined with phrase-based statistical machine translation model andcross-language query expansion method based on latent semantic analysis, a platform forcross-language retrieval of agricultural intellectual property has been constructed. Platformtesting shows that the system has good usability, robustness and maintainability.
Keywords/Search Tags:agricultural intellectual property, statistical phrase, machine translation, latentsemantic analysis, query expansion
PDF Full Text Request
Related items