Font Size: a A A

Research On Streamlining Snippet Cache Of Search Engine

Posted on:2016-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:P Y SunFull Text:PDF
GTID:2308330461480524Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In response to a user query, search engines return the top-k relevant results, each of which contains a small piece of text, called a snippet, extracted from the corresponding document. Obtaining a snippet is time consuming as it requires both document retrieval (disk access) and string matching (CPU computation), so caching of snippets is used to reduce latency. With the trend of using flash-based solid state drives (SSDs) instead of hard disk drives for search engine storage, the bottleneck of snippet generation shifts from I/O to computation. We propose a simple, but effective method for exploiting this trend, which we call fragment caching:instead of caching the whole snippet, we only cache snippet metadata which describe how to retrieve the snippet from the document. While this approach increases I/O time, the cost is insignificant on SSDs. The major benefit of fragment caching is the ability to cache the same snippets (without loss of quality), while only using a fraction of the memory the traditional method requires. In our experiments, we find around 10 times less memory is required to achieve comparable snippet generation times for dynamic memory, and we consistently achieve a vastly greater hit ratio for static caching. Recovering snippet from fragment may introduce a lot of I/O. To reduce some I/O and store more texts in memory, we introduce short document cache to replace document cache. Short document contain the sentences which maybe used to recover snippet and position information of these sentences. With the same size of memory, short document cache can hold more items. Short document cache may introduce much more document retrieval, while these high frequency documents are stored in cache.Contributions of this paper includes:1. To reduce the latency of snippet generation, we introduce fragment. Fragment have stored all the position information to recover snippet from document. Fragment is much smaller than snippet. With the same memory, fragment cache get higher hit ratio, which leads a reduction of duplicate snippet calculation. Recovering snippet from fragment may bring a little calculation, which is much smaller than the benefit that fragment cache brings.2. Recovering snippet from fragment can bring some document retrieval. To reduce these I/O, we introduce short document cache to replace document cache. Short document stores the sentences which may be used to recover snippet, resulting in smaller size than document. With the same size of memory, short document cache can get higher hit ratio than document cache. Together with fragment cache, short document cache can make the process of snippet generation more efficient.
Keywords/Search Tags:SSD, search engine, snippet, cache
PDF Full Text Request
Related items