An Extended Research On Information Retrieval Model Based On Document Relation

Posted on:2021-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:D D He

Full Text:PDF

GTID:2428330620970576

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

As the rapid booming of the Internet,the network information is increasing explosively.How to quickly obtain effective information from massive information has become an urgent problem.Information retrieval technology is one of the core technologies to solve this problem.In the process of information retrieval,users use fewer query words,often can not express the real query intention well,and it is easy to produce the problem of "words fail to express meaning",which makes the query result not ideal.From the perspective of words,some scholars mine the words related to the query words by measuring the relationship between words,as the extension words of the query words,improving the retrieval performance.From the perspective of document,the retrieval performance of the model can also be improved by using the document relationship reasonably,but there are few related researches.In order to solve the above problems,this paper studies the extension and improvement of the basic information retrieval model from the perspective of document relationship.Because of the flexible framework of belief network retrieval model,the classic and convenient of vector space model,this paper takes these two information retrieval models as examples to find the implied relationship between documents,and proposes the following two models respectively:(1)Belief network retrieval model based on document relationship expansion: A layer of document nodes is added to the basic belief network retrieval model,and whether there is an arc between the two layers of documents is determined according to the similarity relation between the documents.That is,for any document,the similarity degree between it and all other documents is calculated,and the previous document with higher similarity degree is taken as the similar document of the document,that is,the parent document of this document.Then,combining the document similarity and the number of parent documents of the document nodes,the probability derivation of the basic belief network retrieval model is modified to give a more reasonable calculation of the document retrieval probability.(2)Improved vector space model based on document relationship: Firstly,the high-correlation documents ranked first in the initial search results are grouped into a benchmark set,and the similarity between each document and the benchmark set in the initial search result set is calculated to correct the similarity between the document and the query.As the final similarity of the document,the vector space model is improved.This paper uses a small Chinese information retrieval data set to verify the validity of the research content.Firstly,all documents in the data set are preprocessed.Then the two new models proposed in this paper are compared with their basic models by experiments.Finally,the index performance of the model is evaluated by using the cumulative loss gain(DCG)and the precision-recall curve.Experimental results show that compared with the basic model,the two new models make the ranking of related documents more reasonable and improve the precision while ensuring the recall.

Keywords/Search Tags:

Information retrieval, Document relationship, Belief network, Vector space model, Document similarity

PDF Full Text Request

Related items

1	Xml Document Information Retrieval Techniques And Realization
2	Application Of Extended Belief Network Model For Scientific Document Retrieval
3	Research Of Chinese Information Retrieval System And Document Reranking
4	The Research Of Enterprise Document Retrieval Model Based On Ontology
5	Phrase-based vector space model in document retrieval
6	Research On The Chinese Science And Technology Document Information Retrieval System Based On The Vector Space
7	Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity
8	Research On Information Retrieval Models Based On Reference Document
9	Research On Query Optimization And Vectorization Technique In Document Retrieval
10	Computing Document Similarity For The Legal Case Retrieval