Font Size: a A A

An Extended Research On Information Retrieval Model Based On Document Relation

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:D D HeFull Text:PDF
GTID:2428330620970576Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
As the rapid booming of the Internet,the network information is increasing explosively.How to quickly obtain effective information from massive information has become an urgent problem.Information retrieval technology is one of the core technologies to solve this problem.In the process of information retrieval,users use fewer query words,often can not express the real query intention well,and it is easy to produce the problem of "words fail to express meaning",which makes the query result not ideal.From the perspective of words,some scholars mine the words related to the query words by measuring the relationship between words,as the extension words of the query words,improving the retrieval performance.From the perspective of document,the retrieval performance of the model can also be improved by using the document relationship reasonably,but there are few related researches.In order to solve the above problems,this paper studies the extension and improvement of the basic information retrieval model from the perspective of document relationship.Because of the flexible framework of belief network retrieval model,the classic and convenient of vector space model,this paper takes these two information retrieval models as examples to find the implied relationship between documents,and proposes the following two models respectively:(1)Belief network retrieval model based on document relationship expansion: A layer of document nodes is added to the basic belief network retrieval model,and whether there is an arc between the two layers of documents is determined according to the similarity relation between the documents.That is,for any document,the similarity degree between it and all other documents is calculated,and the previous document with higher similarity degree is taken as the similar document of the document,that is,the parent document of this document.Then,combining the document similarity and the number of parent documents of the document nodes,the probability derivation of the basic belief network retrieval model is modified to give a more reasonable calculation of the document retrieval probability.(2)Improved vector space model based on document relationship: Firstly,the high-correlation documents ranked first in the initial search results are grouped into a benchmark set,and the similarity between each document and the benchmark set in the initial search result set is calculated to correct the similarity between the document and the query.As the final similarity of the document,the vector space model is improved.This paper uses a small Chinese information retrieval data set to verify the validity of the research content.Firstly,all documents in the data set are preprocessed.Then the two new models proposed in this paper are compared with their basic models by experiments.Finally,the index performance of the model is evaluated by using the cumulative loss gain(DCG)and the precision-recall curve.Experimental results show that compared with the basic model,the two new models make the ranking of related documents more reasonable and improve the precision while ensuring the recall.
Keywords/Search Tags:Information retrieval, Document relationship, Belief network, Vector space model, Document similarity
PDF Full Text Request
Related items