Similarity Search Based On Textual Content

Posted on:2011-03-04

Degree:Master

Type:Thesis

Institution:University

Candidate:Clotilde Uwimana

Full Text:PDF

GTID:2178360308468554

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Today, the ultimate way for people to search and locate information is the use of search engines. Web searching has become one of the major tasks performed by a large population of the web. Though web searching has been a great success and an effective manner for retrieving information, yet methods for retrieving different kinds of information are needed for various applications. Understanding the goals behind web searches provide an outlook for future improvements on web search engines. With the need for recommendation information to support users in decision making and to save them from a tedious work of browsing or reading through an entire collection when looking for similar objects to the query object, our work is based on analyzing the content of pages to retrieve information about similar objects.This thesis looks into the concept of web searching, focusing on similarity search technology. Similarity search refers to searching for objects similar to a query object. Given a user query, which is an object, the system searches through the web to find similar objects that are relevant, meaning objects having common attributes or properties with the query object. From a scenario of a user who is seeking for information about similar places, a new approach is modeled and analyzed with the challenge of determining the properties of places from a large collection of documents with non well-structured information.This paper evaluates techniques which are suitable to find results of similar places to initial query. We propose an approach based on terms extraction where we link the initial query place to its similar places through the terms that occur frequently in its search results. The extracted terms (top k-terms) are deemed to be the common properties and are used as the subsequent query performed by the system to get the final results. However, not only the weighting of terms will allow us to get results for similar places, we also need to carry a check on the results returned by the top-k terms query in order to eliminate documents that are more relevant to the initial query since we are looking for results of similar places rather than results of initial query. The performed evaluation proves that the approach respond to the users'information needs. The method retrieves relevant properties and yields good precision. The analysis also revealed the importance of filtering out documents relevant to the initial query to improve relevancy. We also find out factors that affect the performance; the nature of the query and the number of terms selected as properties of initial query play an important role in the relevancy of final results.

Keywords/Search Tags:

Information search and retrieval, web searching, similarity search, textual content

PDF Full Text Request

Related items

1	Research And Implementation Of Spatial Text Similarity Search
2	The Research On Web Searching And Commending For The Topic-specific Search Engine
3	Study On Similarity Search For Textual And Spatial Data
4	Research On XML Information Retrieval
5	Towards Practical Schemes For Searching The Encrypted Cloud Data
6	Towards Practical Schemes For Searching The Encrypted Cloud Data
7	Multi-Modal Video Information Retrieval
8	"Luder" Content Based Document Search Engine
9	The Design Of The Auto Industry Vertical Searching System Prototype And The Implementation Of The Primary Module
10	Research On Hashing Methods Of Approximate Nearest Neighbor Searching On Big Data