The Study On The Relatedness Computation For Structural Documents In The Information Retrieval

Posted on:2008-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhao

Full Text:PDF

GTID:2178360212994645

Subject:Computer system architecture

Abstract/Summary:

With the development of the socialization, the need and dependence on information is more and more intensive. Many experts have focused on the way of gain the useful information from a large numbers rapidly. The objects of information retrieval are texts in the early time. Now some new objects, such as graph, image, audio and video, are increasing rapidly and all have been included in the retrieval.The automatic management of documents in the information retrieval includes many problems, such as document retrieval, automatic classification and clustering, the design of document retrieval engineer in QA system, etc. The core is the similarity computation and the relatedness computation. However the relatedness is usually substituted by the similarity because of the lack of models and arithmetic for measuring. It is not good if the documents are evaluated only by similarity in the information retrieval, so we will introduce a new method to compute the documents relatedness in this paper.The paper mainly discusses the theoretical models and algorithms for the relatedness between two structural documents. First, the difference between the relatedness of documents in terms of semantic contents of the documents and the similarity of documents in terms of the features of documents is analyzed. It brings forward the idea that the tree isomorphism is proposed as a measure for the relatedness of documents. Secondly, it considers the situation, in which there are more than two nodes of the same tag in an order label tree, to improve the precision. Thirdly, the main idea of edit distance is the price of translating a tree into another by the edit operations, such as add, delete and change tags. It usually considers the price more than the nodes weight during the translation. It has been proved that the weights are different according to their levels, so this paper adds the weight as an necessary term during the computation. The importance of the node place is reflected by its weight and a factor is given to the unmatched node based on different cases at the first time. Finally, a synthesized formula is proposed for the computation of the relatedness of structural documents. Experiments show that this methodology is very suitable to partition the user's requests and documents which are fuzzy or approximately.

Keywords/Search Tags:

Information Retrieval, Document Similarity, Structure Selativity

Related items

1	An Extended Research On Information Retrieval Model Based On Document Relation
2	Xml Document Information Retrieval Techniques And Realization
3	Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity
4	Information Retrieval Using Categorization Structures
5	The Research Of Enterprise Document Retrieval Model Based On Ontology
6	Computing Document Similarity For The Legal Case Retrieval
7	Research On Information Need Domain Of Information Retrieval
8	Research On Pseudo Relevance Feedback Based On Document Similarity
9	The Role of Document Structure and Citation Analysis in Literature Information Retrieval
10	Design And Implementation Of Domain-based Document Retrieval System