Font Size: a A A

Research On SSD-based Online Inverted Index Maintenance Strategies And The Optimization

Posted on:2014-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:H M WangFull Text:PDF
GTID:2268330422463537Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Index maintenance strategy plays a crucial role in a full-text retrieval system whichhas to deal with dynamic text collections and satisfy users’ real-time query requirement.Existing index maintenance methods are mainly designed based on the features of harddisk drive (HDD), and the performance is limited by the relatively low I/O performance ofHDD. The emerging solid state disk (SSD) has many desired merits, and compared withHDD, the most prominent one is its high performance of random data access. If we prop-erly use SSD instead of HDD to store the inverted index, system’s overall performancewill be greatly improved. However, SSD has some characteristics that are totally differentwith HDD, and they are not considered in the existing index maintenance methods.Therefore, directly adopting existing methods to maintain index on SSD will not only failto make full use of SSD’s advantages, but also do harm to SSD.First, the existing index maintenance methods are analyzed on SSD through experi-ments and they are found no longer suitable for SSD: The pure in-place method producesovermuch random writes, while the merge-based method generates massive size of writesseeming unnecessary for SSD, which impose heavy traffic and harm to SSD. Based on theexperiment result, we propose some principles for designing SSD-based index mainten-ance strategy.Second, a new hybrid index update strategy is proposed. The strategy classifies allterms into short and long according to the length of their posting lists, and their indexesare separately maintained by no merge and in-place, based on SSD’s fast random read andrelatively efficient semi-random write characteristics. Through this way, inefficient smallrandom writes are avoided and extra write operations caused by merge are also prevented.Compared to existing methods, experimental results demonstrate our design improves thecomprehensive index maintenance and query performance; meanwhile, it is friendlier toSSD, especially when the document collection grows continually.Finally, optimized strategy is proposed to further improve the effectiveness of ourstrategy, that is, after executing several times of primitive strategy, the following optimi-zation measures are performed alternately: long terms’ indexes are still updated byin-palce, while short terms’ indexes will be merged, meanwhile, a specified volume of so-called medium terms’ index be kept residing in memory. Experimental results showthat the optimized strategy can improve query performance by32.1%without degradingindex maintenance performance; meanwhile, it doesn’t do undue harm to SSD.
Keywords/Search Tags:Full-text retrieval, online index maintenance, solid state disk, hybrid update, semi-random write
PDF Full Text Request
Related items