Font Size: a A A

Mongolian Query Expansion And Information Retrieval System Establishment

Posted on:2019-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WenFull Text:PDF
GTID:2428330596456140Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a language of the Mongolian people,Mongolian is a valuable cultural asset.With the increasing number of Mongolian websites,the demand of Mongolian information retrieval is also increasing.In order to deal with issues of mismatch between words,it is necessary to add the query extension technique in information retrieval,and expand the search terms to make the retrieval result more comprehensive.This thesis studies the Mongolian query expansion method on the basis of pseudo-correlation feedback query expansion method.Firstly,this thesis applies the traditional pseudo-correlation query expansion algorithms TF-IDF and TextRank to the Mongolian query extension.And an algorithm Km-TF-IDF combining TF-IDF and K-means is proposed,which effectively suppresses the phenomenon of "drift" in Mongolian query.Secondly,this thesis applies deep learning to Mongolian query expansion,and proposes a Mongolian query expansion algorithm Word2 vec based on word vector,which solves the problem that the traditional pseudo-correlation query expansion algorithm relies too much on the initial retrieval result.The experimental result shows the algorithm obtains better performance than the traditional query expansion algorithm in the information retrieval.Furthermore,this thesis proposes a hierarchical word vector Mongolian query expansion algorithm TFIDF-Word2 vec,which considers the internal document information and the word vector information.The experimental result shows that the algorithm further improves the accuracy of the retrieval system.And it is verified that the algorithm is effective in Mongolian query expansion.Finally,this thesis combines the structure of Mongolian word formation,using the training corpus after segmentation affixes for word vector training.The experimental results show that the word vector trained after segmentation affixation can reflect the correlation of vocabulary and solve data sparse problem.The performance of the retrieval system has been further improved.
Keywords/Search Tags:Mongolian information retrieval, Query expansion, Word vector, Neural network language model
PDF Full Text Request
Related items