Research On Streamlining Snippet Cache Of Search Engine

Posted on:2016-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:P Y Sun

Full Text:PDF

GTID:2308330461480524

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In response to a user query, search engines return the top-k relevant results, each of which contains a small piece of text, called a snippet, extracted from the corresponding document. Obtaining a snippet is time consuming as it requires both document retrieval (disk access) and string matching (CPU computation), so caching of snippets is used to reduce latency. With the trend of using flash-based solid state drives (SSDs) instead of hard disk drives for search engine storage, the bottleneck of snippet generation shifts from I/O to computation. We propose a simple, but effective method for exploiting this trend, which we call fragment caching:instead of caching the whole snippet, we only cache snippet metadata which describe how to retrieve the snippet from the document. While this approach increases I/O time, the cost is insignificant on SSDs. The major benefit of fragment caching is the ability to cache the same snippets (without loss of quality), while only using a fraction of the memory the traditional method requires. In our experiments, we find around 10 times less memory is required to achieve comparable snippet generation times for dynamic memory, and we consistently achieve a vastly greater hit ratio for static caching. Recovering snippet from fragment may introduce a lot of I/O. To reduce some I/O and store more texts in memory, we introduce short document cache to replace document cache. Short document contain the sentences which maybe used to recover snippet and position information of these sentences. With the same size of memory, short document cache can hold more items. Short document cache may introduce much more document retrieval, while these high frequency documents are stored in cache.Contributions of this paper includes:1. To reduce the latency of snippet generation, we introduce fragment. Fragment have stored all the position information to recover snippet from document. Fragment is much smaller than snippet. With the same memory, fragment cache get higher hit ratio, which leads a reduction of duplicate snippet calculation. Recovering snippet from fragment may bring a little calculation, which is much smaller than the benefit that fragment cache brings.2. Recovering snippet from fragment can bring some document retrieval. To reduce these I/O, we introduce short document cache to replace document cache. Short document stores the sentences which may be used to recover snippet, resulting in smaller size than document. With the same size of memory, short document cache can get higher hit ratio than document cache. Together with fragment cache, short document cache can make the process of snippet generation more efficient.

Keywords/Search Tags:

SSD, search engine, snippet, cache

PDF Full Text Request

Related items

1	Fast Snippet Generation Approach Based On CPU-GPU Hybrid System
2	Research On Dynamic Cache Strategy For Flight Search Engine
3	The Research And Application Of Vertical Search Engine In The Field Of Group-purchasing Web
4	Research And Implementation Of Vertical Search Engine Of Blog Oriented
5	Research On Retrieval Technology In Search Engine
6	Research On Code Snippet Recommendation Method Based On Code Statement Granularity Representation
7	Research And Implementation Of The Cache System In Distributed Search Engine
8	Research On Index Security And Cache Policy Of Distributed Search Engine
9	The Design And Implementation Of Cross-language Navigational Search Engine
10	The Design And Implementation Of Cross-Language Navigational Search Engine