Font Size: a A A

Research On Efficient Index Structure And Parallelization Based On Double Array Trie

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ChenFull Text:PDF
GTID:2438330566483712Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
String search is widely used in search engines,network intrusion detection,computer virus signature matching,and DNA sequence matching.In order to effectively perform a string search,you need a suitable data structure for storing and managing strings.Trie is a classic data structure for efficient string processing,especially for string lookups and prefix lookups.The common Trie storage method includes matrix storage and chain storage.The matrix storage has high query speed but has serious space waste problems.The utilization of chain storage space is high but the query efficiency is low.The double-array Trie can effectively combine the advantages of the above two expressions of the Trie tree,and has a high storage efficiency while having a low storage space.Although the dual array Trie combines some advantages of the two expressions of the Trie tree described above,the efficiency of its creation drops sharply as the amount of data increases.To solve this problem,this paper deeply analyzes and studies the creation process of the double-array Trie index.It is found that the main reason for the decrease of index creation efficiency lies in the conflict caused by the index creation process and the overhead of conflict processing.For this reason,this paper proposes a partitioned double-array Trie index structure to improve the efficiency of index creation.This structure mainly improves the efficiency of index creation from the following two perspectives: 1)Development of lexicographical order that the string dataset is sorted in lexicographic order Sorting,which reduces the amount of data that moves when conflicts occur during index creation.2)Create a partitioned double array structure for the first character partition to reduce the number of conflicts and reduce the overhead of resolving conflicts.Experimental results show that the proposed partitioned double array Trie index structure can reduce the number of conflicts and the overhead of processing conflicts at the same time.While greatly improving the efficiency of index creation,it can further improve the index query efficiency.In order to further improve the efficiency of the creation of double-array Trie index,this paper studies the parallel double array structure based on Open MP on multi-core platform.In order to achieve efficient parallelism,this paper further proposes an efficient partitioning strategy with equal number of strings,thus making the load among the threads as balanced as possible.The performance of index creation and query algorithm under different partition strategies was compared through extended experiments.The experimental results show that compared to the serial index creation,the parallel creation of double array Trie index has a considerable improvement in performance.
Keywords/Search Tags:double array trie, partition, Conflict, dictionary order, OpenMP
PDF Full Text Request
Related items