Font Size: a A A

Research On SSD-based Inverted Index Construction And Maintenance Strategies

Posted on:2013-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:X F ChenFull Text:PDF
GTID:2248330392957875Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Inverted index is the core data structure for Information Retrieval (IR) systems, andwith the rapid growth of digitalization level, a massive size of inverted indexes have to bemaintained, which are usually stored on hard disks. Although hard disks have the merits ofhigh capacity and low price, comparing to the speed of CPU and memory, the performanceof hard disks is much slower and considering to their mechanic nature this trend is unlikelyto change in the nearest future, which further causes the I/O bottleneck of IR systems. On theother hand, the flash based Solid State Drivers (SSD) become a hot research object in thefield of data storage. Comparing to the conventional hard disks the most outstandingadvantage of SSDs is their much higher I/O performance then hard disks. Therefore, ifinverted indexes can be stored on SSDs instead of hard disks, the overall performance of IRsystems will definitely improve. However, existing index management strategies are alldesigned toward the hard disks, and considering the unique characteristics of SSDs, theseapproaches not only cannot make full use of the SSD but also can be harmful to it.First, the present thesis analyzes existing index construction and maintenanceapproaches on SSD. As these conventional strategies are all based on the hard disks, fromthe experimental results we observe that the in-place method is low efficiency and produceslarge number of random writes. Meanwhile, the merge-based method results in heavy writetraffic on SSD which could further reduce its lifetime. Therefore, considering these analysis,both the proposed index construction and maintenance strategies still follow the basic idea ofmerge-based methods, however the extra write traffic which triggered by the merge eventshould be eliminated.Second, a new index construction and a new index maintenance approaches are givenrespectively. The proposed index construction method store the temporary index files whichare produced by in-memory index in different blocks based on the terms and then mergethem partially instead of completely. On the other hand, the proposed index maintenanceapproach which not only updates index files with sequential disk accesses with mergeoperation, but also applies different merge-based management methods for two type of termswhich are categorized by their size of postings. Finally, a performance evaluation is conducted and experimental results shows that bothproposed strategies not only maintain a good performance of both indexing and query, butalso relieve the total write traffic on SSDs. Therefore, the newly designed methods are moresuitable for the SSD than the existing ways.
Keywords/Search Tags:Information Retrieval, Solid State Drivers, Inverted Index, IndexConstruction, Index Maintenance
PDF Full Text Request
Related items