Font Size: a A A

Exploiting Document Boltzmann Machine In Query Extension

Posted on:2018-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:L M HuangFull Text:PDF
GTID:2348330542484888Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Most work related to query extension(QE)adopted the assumption that terms in a document are independent,and multinomial distribution is widely used for feedback documents modeling in lots of QE models.We argue that in QE methods,the relevance model(RM)which generates the feedback documents should be modeled with a more suitable distribution,in order to naturally handle the term associations in feedback document.Recently,Document Boltzmann Machine(DBM)was proposed for document modeling in information retrieval,and this model can relax the independence assumption,i.e.,can capture the term dependency naturally.It has been shown that DBM can be seen as the generalization of traditional unigram language model and achieves better ad hoc retrieval performance.In this paper,we replace the multinomial distribution in the traditional unigram RM method with DBM,while leaving the main QE framework nearly unchanged to keep the model uncomplicated.Thus,the relevance model is estimated by the DBM trained on feedback documents,called relevance DBM(rDBM).The extended query is generated from the learnt rDBM,and we give the final extended query likelihood according to the parameter values in rDBM.One difficulty in learning rDBM is the problem of data sparseness,which could lead to overfitted rDBM and harm the retrieval performance.To solve this problem,we adopt Confident Information First(CIF)as model selection principle to reduce the complexity of rDBM,which lead our proposed query extension method more efficient and practical.Experiments on several standard TREC collections show the effectiveness of our QE method with DBM and model selection method.In addition,we also optimize the document Boltzmann machine by the Akaike information criterion method.As a result,we reduce the complexity of the model,solve the problem of data sparseness which could lead to overfitted and improve the retrieval performance on several standard TREC collections.
Keywords/Search Tags:Document Boltzmann Machines, Query Extension, Model Selection, CIF, AIC
PDF Full Text Request
Related items