Font Size: a A A

Research On Parallel Algorithms And Performance Optimization Of Top-к Querying Problem

Posted on:2012-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:C WuFull Text:PDF
GTID:1118330335962393Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the rapid development of World Wide Web, billions of documents thatcompose the largest human repository of knowledge in history have been created on theweb. However it becomes more and more difficult for people to find useful informationon web, because of the large number of documents accumulated. Thus search engine,whichhelpspeopletacklethelargeamountofinformationontheweb,hasbecomemoreand more important in information retrieval. When a user poses a query, he (she) maybe only interested in the top-k answers returned by the search engine. However thedocuments hit by this query might range from billion to a hundred billion, because ofthe large amount of data the search engine queries on. So how to get the top-k set outof large amount of results fast and accurately has become a key problem in nowadayssearch engine research.This dissertation focuses on designing parallel top-k querying algorithm to dealwithvastamountsofdataandacceleratingthetop-k queryingprocedure. Themaincon-tents include parallel algorithm designing and performance optimization on distributedmemory architecture and shared memory architecture. First, message passing parallelalgorithms are proposed to tackle the querying over vast amounts of data. Then, thescanning method of top-k querying algorithm is studied, and new parallel algorithm areproposed to reduce the data accesses in top-k query. Finally, program performance op-timization skills are studied in order to improve top-k querying program performanceon multicore architecture. The main contributions of this dissertation are as follows.1. Parallel top-k querying algorithm on distributed memory architecture Thememory amount of single PC has become not enough to process modern top-kquery, because of the vast amounts of data to be stored. This dissertation pro-poses a new data partitioning method and designs new parallel top-k queryingalgorithms according to this partitioning. Then the message passing cost of par-allel algorithm is studied, and optimized version of parallel algorithm is designedby reducing the length of messages.2. Reducing data accesses of top-k querying algorithm The cost of data accesseshas weighted a large portion of total cost of top-k querying algorithm. This dis-sertation studies the data source scanning method, and proposes parallel queryingalgorithm using symmetry breaking to reduce total data accesses. 3. Optimization of NRA program on multicore platform Top-k querying algo-rithms are designed more and more complicated, a lot of computation is insertedto reduce data accesses. Thus the portion of computing cost can not be ignored inperformance optimization. This dissertation studies the procedure of NRA algo-rithm,andproposessomeoptimizationmethodsincludingchangingdatastructureandusingOpenMPtoparallelizeprogram. TheproposedoptimizationtechniquessignificantlyimprovetheDLPandTLPofNRAprogram,andacceleratetheNRAprogram on multicore architecture.
Keywords/Search Tags:Top-k Querying, Parallel Computing, Distributed Memory Architecture, Shared Memory Architecture, Multicore, Performance Optimizaiton
PDF Full Text Request
Related items