Font Size: a A A

Document similarity based on concept tree distance

Posted on:2008-09-17Degree:M.SType:Thesis
University:University of KansasCandidate:Lakkaraju, PraveenFull Text:PDF
GTID:2448390005474720Subject:Computer Science
Abstract/Summary:
The Web is fast moving from an era of search engines to an era of discovery engines. Discovery engines help you find things that you never knew existed or did not know how to ask for. One of the ways this can be done is by automatically computing and displaying objects that are similar to the object in which the user is currently expressing interest. In this paper, we present a new approach to compute interdocument similarity that is based on a tree-matching algorithm. We represent each document as a concept tree using the concept associations obtained from a classifier. We make use of a tree-matching algorithm called the tree edit distance to compute similarities between these concept trees. Experiments on a subset of documents from the CiteSeer collection showed that our algorithm performed better than the document similarity based on the traditional vector space model.
Keywords/Search Tags:Document, Similarity, Concept, Tree
Related items