Font Size: a A A

Research In Information Retrieval Based On LightLDA

Posted on:2018-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2348330518982354Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,the amount of data on the Internet showed a trend of exponential growth,all kinds of data are growing day by day. In such a vast ocean of data,how to obtain the information users need quickly and accurately has become a more and more urgent problem, which is also a great challenge to information retrieval technology today.At present, the mainstream approach of bringing semantic information in information retrieval is to use the machine learning method LDA to train topic model.Although the integration of LDA's topic information can improve retrieve performance,the algorithm complexity of LDA model is too high, which makes the training process be easily limited by the size of the corpus and the number of topics, therefore it can't solve the retrieve problems well in the age of Big Data today. In 2015, the open source of Microsoft's distributed, high-performance tool LightLDA made us see the hope of solving these problems. This paper focuses on those problems and studies the feasibility and effectiveness of LightLDA in information retrieval. The main work includes the following two aspects:First, we apply LightLDA to information retrieval models. We use LightLDA to train the topic information of several large-scale TREC datasets, then integrate the trained topic information into the framework of language model to construct a retrieval model called LLBDM. Then on this basis, the concept of information entropy is used to construct a new retrieval model called LMLIE. Finally, the effects of these two models and the Baseline in the information retrieval are compared,and the influence of the relevant parameters on these models is analyzed. The feasibility and effectiveness of LightLDA in information retrieval model are verified by experiments.Second, we apply LightLDA to pseudo relevance feedback. We use LightLDA to train the topic information of pseudo relevance feedback documents, based on the Rocchio pseudo relevance feedback framework and the topic information above, we construct the pseudo relevance feedback model Rocchio-LightLDA. Finally, the Rocchio-LightLDA model is compared with the Baseline model in pseudo relevance feedback, and the influence of the relevant parameters on the model is analyzed. The feasibility and effectiveness of LightLDA in pseudo relevance feedback are verified by experiments.Through the research of the above two aspects, we have successfully applied LightLDA to information retrieval, which provides a feasible solution for the retrieval tasks in the age of Big Data today. At the same time, it also has some reference significance for information retrieval on massive data.
Keywords/Search Tags:LightLDA, Information Retrieve, Retrieve Model, Pseudo Relevance Feedback
PDF Full Text Request
Related items