Font Size: a A A

Research On SSD-based Hybrid Storage Ar-chitecture For Large-scale Search Engines

Posted on:2013-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:C Z LiFull Text:PDF
GTID:2248330392957853Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large search engines need to process hundreds of queries per second on collections ofhundreds of millions of documents. Nowadays, large-scale search engines use hard diskdrives (HDD) to store the mass index data, the low I/O performance of HDD becomes themajor bottleneck in modern large-scale search engines. Fortunately, compared with HDD,the emerging solid state disk (SSD) technology provides many desired technical merits,and most importantly, ultrahigh performance for random data access. However, three po-tential issues may complicate the full adoption of SSD, such as high hardware cost,asymmetrical I/O performances and limited block erasure count. Therefore, large-scalesearch engines cannot replace HDD with SSD completely for this moment.Search engines are typical I/O-intensive application with four obvious characteristics inthe I/O patterns, such as read-dominant, locality, skipped reads, and random read. Consi-dering the characteristics of SSD and I/O patterns of search engines, the SSD-based hybridstorage architecture is the right balance between performance, cost and reliability. It cach-es hot data in memory or SSD, which can decrease the access on disks and improve theI/O performance.The corresponding data management policy adopts an improved log-based method toorganize the cached data on SSD, whose goal is to improve the performance of search en-gines while reducing the block erasure operations in SSD. There are three policies: first,“data selection” is desinged to choose the data to be cached in memory or SSD accordingto the given characteristics; second,“data placement” is an improved log-based data man-agement policy, which manges the cached data on SSD so as to ensure the performance ofwrite and read operations; third,“data replacement” distinguishes overwriting operationsof result and inverted list on SSD so as to avoid expensive random writes and reduceblock erasure operations. The experimental results demonstrate our design improves thehit ratio by13.31%, the performance by41.05%, the average access time inside SSD by43.83%, and reduces block erase operations by71.52%.
Keywords/Search Tags:Full-text Retrieval, search engine, solid state disk, hybrid storage architecture, caching
PDF Full Text Request
Related items