Font Size: a A A

The Research Of Digital Signal Processing Based Models For Information Retrieval

Posted on:2020-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W YingFull Text:PDF
GTID:1368330605957458Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,users of information retrieval(IR)systems from different occupations need to accurately and efficiently obtain the information from massive data.Thus they have a strong demand of the high-precision information retrieval systems.In recent dacades,the vast majority of researches in IR field have been focusing on the implementation and optimization of the main IR models including the probabilistic models,statistical language models and the vector space models.Only a couple of researches have involved other categories of models or frameworks According to the recent academic papers,the main categories of IR models which have been proposed recently just could outperform their baseline models by only a small margin.The development of the traditional IR models has run into a bottleneck and the quest for new theoretical frameworks has been very intense.In recent years,some newly proposed IR models and frameworks have been paid much attention in this field.Among these models,the IR models based on digital signal processing(DSP)are worth to be paid more attention.This category of models have been proposed by being novelly introduced the theories and concepts from the field of digital signal processing.According to the experimental results of the resent researches,the category of models still have flaws in several aspects to be overcome and they also have a large room for improvements.The IR framework and the corresponding models proposed in this thesis are also based on the theories and concepts in the field of digital signal processing and they optimize the existing DSP-based frameworks and models in several aspects.The main contributions of this thesis are as follows(1)This thesis proposes a DSP-based IR framework from a new perspective,denoted as DSPF(Digital Signal Processing based Framework).In the existing DSP-based framework,the documents are considered as a filter set which are represented in frequency domain.However,the query terms are considered as a set of signals which are represented in time domain.Thus,for the convenience of the filtering calculation,the signals and documents need to be represented in the same domain.Therefore,this framework has to transform the representation of the signals from time domain to frequency domain,which is considered to be very complicated to be implemented.Besides,the existing framework considers every query term as only one category of signals which does not have an hyper-parameter for being adjusted for a better performance of the model,which leads to the failure of the framework in some cases.In addition,for better performance of our proposed framework,the framework proposed in this thesis considers every query term as a spectrum whose envelop is the curve of seven distinct kernel functions which including Gaussian,Triangle,Circle,Cosine,Quartic,Epanechnikov and Tri weight kernel functions.Besides,In DSPF framework,the representation of the spectrum has a hyper-parameter,which makes the framework perform better by adjusting the hyper-parameter of the spectra and the filters simultaneously.(2)This thesis proposes the new model DSPF-BM25 by improving the term weighting method of the probabilistic models and introducing it into the framework DSPF.In this research,the DSPF-BM25 is combined with each of the seven kernel functions we mentioned above.To testify the effectiveness of our proposed model,the model is tested on five standard datasets of news and two datasets containing documents which was crawled from the Web in terms of varied metrics of retrieval precision(the main metric is MAP(Mean Average Precision)).The experimental result shows that DSPF-BM25 which combined with Gaussian and Cosine functions outperforms all the baseline models on all the datasets in terms of MAP.These baseline models conclude the classic probabilistic model BM25,BM25+and the existing most effective DSP-based IR model LSPR-BM25.(3)This thesis proposes the new model DSPF-DLM by improving the term weighting method of the statistical language models and introducing it into the framework DSPF.In this research,the DSPF-DLM is combined with each of the seven kernel functions we mentioned above.To testify the effectiveness of our proposed model,the model is tested on five standard datasets of news and two datasets containing documents which was crawled from the Web in terms of varied metrics of retrieval precision(the main metric is MAP(Mean Average Precision)).The experimental result shows that DSPF-DLM which combined with Gaussian and Cosine functions outperforms most of the baseline model DLM on all the datasets in terms of MAP.(4)This thesis proposes the new model DSPF-MATF by improving the term weighting method of the vector space models and introducing it into the framework DSPF.In this research,the DSPF-BM25 is combined with each of the seven kernel functions we mentioned above.To testify the effectiveness of our proposed model,the model is tested on five standard datasets of news and two datasets containing documents which was crawled from the Web in terms of varied metrics of retrieval precision(the main metric is MAP(Mean Average Precision)).The experimental result shows that there is not an obvious difference in performance of DSPF-MATF when being combined with each kind of the kernel functions on all the datasets in terms of MAP.On almost all the datasets,DSPF-MATF outperforms almost all the baseline models in terms of MAP.These baseline models conclude BM25,DLM,LSPR-BM25,DSPF-BM25,DSPF-DLM and the MATF,which is one of the existing most effective vector space models(5)Based on the proposed model in this thesis,this research simply implements a new DSP-based IR system for medical literatures.This system is able to retrieve the medical literatures involving diagnosis,treatment and nursing effectively and proficiently according to the query submitted by the medical staff.This system could help the medical stuff as a reference in many cases when providing medical survice.
Keywords/Search Tags:The models for information retrieval, Digital signal processing, Retrieval precision, Kernel functions, Information retrieval system
PDF Full Text Request
Related items