Font Size: a A A

Design And Implementation Of Multilingual Information Retrieval System Based On Latent Semantic Analysis

Posted on:2020-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:K B XuFull Text:PDF
GTID:2428330572489370Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are huge amounts of information resources stored on the Internet with diverse languages.Therefore,it is difficult for people to interpret information effectively when they get resources.How to get the information they want from tens of thousands of large-scale information has become a top priority.Moreover,the research of cross-language information retrieval technology and methods has gradually become an important research direction of information processing technology.At present,most of the cross-language information retrieval are realized with machine translation,and then it carries out single language information retrieval.In addition,bilingual dictionaries are also used to study cross-language information retrieval.Although this method improves the recall rate of retrieval in query expansion,it requires large-scale,high-quality,well-translated parallel corpus in the training of translation model,and the acquisition of high-quality corpus is still difficult.This dissertation proposed a cross-language information retrieval model based on latent semantic analysis.The main work is as follows:Firstly,we collected and collated parallel corpus of Chinese,Korean and English abstracts of scientific and technological literature.Using latent semantic space model,the corpus was divided according to the resource limitation for SVD operating,then a separate dictionary were established for each corpus.Secondly,for user-provided queries,the latent semantic subspace(target subspace)to be retrieved was located according to word co-occurrence criteria.Combined with Word2vec model,the original query was extended,and a new spatial dimension was introduced to deal with the unknown words.And then multilingual retrieval results were obtained by searching in target subspace.Finally,based on the methods proposed in this dissertation,a cross-linguistic information retrieval system for scientific and technological documents between China,Korea and English was developed.The results of experiment and system operation show that the precision rate and recall rate of the designed system meet the design requirements.The similarity of queries is above 85%based on latent semantic space model.Latent semantic space model can better represent document semantic information in cross-language information retrieval,which ensures the accuracy and validity of the cross-language retrieval system.
Keywords/Search Tags:latent semantic indexing, cross-language information retrieval, query expansion, Word2vec
PDF Full Text Request
Related items