Font Size: a A A

Research On Set T Coverage Query Algorithm Based On Inverted Index

Posted on:2018-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhangFull Text:PDF
GTID:2358330518461969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
T-overlap is the basis of T-overlap query had been applied in many important fields,such as recommendation system,document approximate duplicate detection and deletion,web search,automatic completion of entity resolution and more.T-overlap queries' efficiency based traditional index is low,but T-overlap based bitmap index can be represent by AND,OR and other simple fast logic operations.Usually,T-overlaybased on bitmap index use compressed bitmap,but the update operation of the compressed bitmap operation is complex,so compressed bitmap is not suitable for frequent data updates.so as to improve the efficiency of T-overlap queries,and we research T-overlap queries based bitmap indexes.First of all,we designed an ef-ficient bitmap inverted indexes structure,segmented bitmap inverted indexes(SBII).Not only SBII consider the problem of storage space overhead,but also make full use of the characteristics of fast bitmap computing and avoid the problem of compressing bitmap updates.Secondly,based on SBII,we realize T-overlap query,SBII Looped.Finally,Experiments are carried out on two real datasets and the results show that SBII_Looped outperforms the state-of-the-art Boolean query algorithms.The efficiency of CPU-based T-overlap query is low.In order to solve this problem,this paper study GPU-based T-overlap algorithm.First,this paper designed an efficient GPU inverted index GS? in view of the characteristic of GPU shared memory.GS? divided inverted index into several segments by hash segmentation method,so that each segment can be accessed in the shared memory.Then based on the GS?,this paper achieved efficient parallel segmentation T-overlap algorithm GSPSthat group query,support parallel processing in queries and among queries,so this algorithm is efficient.Finally,the results of experience on real data sets show that the performance of GSPS has increased by 30%compared with the state-of-the-art GPU parallel T-overlap algorithm.
Keywords/Search Tags:T-overlap query, hash segmentation, shared memory, GS?, GSPS-TLLO, SB?, SB?_Looped
PDF Full Text Request
Related items