Font Size: a A A

The Study On The Relatedness Computation For Structural Documents In The Information Retrieval

Posted on:2008-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2178360212994645Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of the socialization, the need and dependence on information is more and more intensive. Many experts have focused on the way of gain the useful information from a large numbers rapidly. The objects of information retrieval are texts in the early time. Now some new objects, such as graph, image, audio and video, are increasing rapidly and all have been included in the retrieval.The automatic management of documents in the information retrieval includes many problems, such as document retrieval, automatic classification and clustering, the design of document retrieval engineer in QA system, etc. The core is the similarity computation and the relatedness computation. However the relatedness is usually substituted by the similarity because of the lack of models and arithmetic for measuring. It is not good if the documents are evaluated only by similarity in the information retrieval, so we will introduce a new method to compute the documents relatedness in this paper.The paper mainly discusses the theoretical models and algorithms for the relatedness between two structural documents. First, the difference between the relatedness of documents in terms of semantic contents of the documents and the similarity of documents in terms of the features of documents is analyzed. It brings forward the idea that the tree isomorphism is proposed as a measure for the relatedness of documents. Secondly, it considers the situation, in which there are more than two nodes of the same tag in an order label tree, to improve the precision. Thirdly, the main idea of edit distance is the price of translating a tree into another by the edit operations, such as add, delete and change tags. It usually considers the price more than the nodes weight during the translation. It has been proved that the weights are different according to their levels, so this paper adds the weight as an necessary term during the computation. The importance of the node place is reflected by its weight and a factor is given to the unmatched node based on different cases at the first time. Finally, a synthesized formula is proposed for the computation of the relatedness of structural documents. Experiments show that this methodology is very suitable to partition the user's requests and documents which are fuzzy or approximately.
Keywords/Search Tags:Information Retrieval, Document Similarity, Structure Selativity
PDF Full Text Request
Related items