Document similarity based on concept tree distance

Posted on:2008-09-17

Degree:M.S

Type:Thesis

University:University of Kansas

Candidate:Lakkaraju, Praveen

Full Text:PDF

GTID:2448390005474720

Subject:Computer Science

Abstract/Summary:

The Web is fast moving from an era of search engines to an era of discovery engines. Discovery engines help you find things that you never knew existed or did not know how to ask for. One of the ways this can be done is by automatically computing and displaying objects that are similar to the object in which the user is currently expressing interest. In this paper, we present a new approach to compute interdocument similarity that is based on a tree-matching algorithm. We represent each document as a concept tree using the concept associations obtained from a classifier. We make use of a tree-matching algorithm called the tree edit distance to compute similarities between these concept trees. Experiments on a subset of documents from the CiteSeer collection showed that our algorithm performed better than the document similarity based on the traditional vector space model.

Keywords/Search Tags:

Document, Similarity, Concept, Tree

Related items

1	Research On Semantic Similarity Computation And Applications
2	Research On Concept Similarity Of Web Information Retrieval
3	Study On Text Clustering Based On Concept Semantic Tree
4	Web Page Structure Similarity Algorithms And Applications,
5	Research Of P2P Document Query Based On Semantic Similarity
6	The Research On Computational Verb Decision Tree Classification Algorithm With Concept Similarity And Its Application On The Futures Market
7	A concept map-based approach to document indexing and navigation
8	A Concept Base Model For Intelligent Searcher And Its Document Evaluation
9	Research On Constructing Method Of Iceberg Concept Lattice And Fuzzy Concept Similarity
10	Research Of Hownet Based Word Semantic Computation And Application