Research On Efficient Index Structure And Parallelization Based On Double Array Trie

Posted on:2019-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Chen

Full Text:PDF

GTID:2438330566483712

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

String search is widely used in search engines,network intrusion detection,computer virus signature matching,and DNA sequence matching.In order to effectively perform a string search,you need a suitable data structure for storing and managing strings.Trie is a classic data structure for efficient string processing,especially for string lookups and prefix lookups.The common Trie storage method includes matrix storage and chain storage.The matrix storage has high query speed but has serious space waste problems.The utilization of chain storage space is high but the query efficiency is low.The double-array Trie can effectively combine the advantages of the above two expressions of the Trie tree,and has a high storage efficiency while having a low storage space.Although the dual array Trie combines some advantages of the two expressions of the Trie tree described above,the efficiency of its creation drops sharply as the amount of data increases.To solve this problem,this paper deeply analyzes and studies the creation process of the double-array Trie index.It is found that the main reason for the decrease of index creation efficiency lies in the conflict caused by the index creation process and the overhead of conflict processing.For this reason,this paper proposes a partitioned double-array Trie index structure to improve the efficiency of index creation.This structure mainly improves the efficiency of index creation from the following two perspectives: 1)Development of lexicographical order that the string dataset is sorted in lexicographic order Sorting,which reduces the amount of data that moves when conflicts occur during index creation.2)Create a partitioned double array structure for the first character partition to reduce the number of conflicts and reduce the overhead of resolving conflicts.Experimental results show that the proposed partitioned double array Trie index structure can reduce the number of conflicts and the overhead of processing conflicts at the same time.While greatly improving the efficiency of index creation,it can further improve the index query efficiency.In order to further improve the efficiency of the creation of double-array Trie index,this paper studies the parallel double array structure based on Open MP on multi-core platform.In order to achieve efficient parallelism,this paper further proposes an efficient partitioning strategy with equal number of strings,thus making the load among the threads as balanced as possible.The performance of index creation and query algorithm under different partition strategies was compared through extended experiments.The experimental results show that compared to the serial index creation,the parallel creation of double array Trie index has a considerable improvement in performance.

Keywords/Search Tags:

double array trie, partition, Conflict, dictionary order, OpenMP

PDF Full Text Request

Related items

1	Research On Cloud Data Security Deduplication Technology Based On Double Array Trie Tree
2	A Study And Realization Of Double-Array Based Segmentation Dictionary
3	Research And Application Of The Key Techniques In Chinese Query Answering System Of Networking Education
4	Research And Implementation Of Sdn Security Policy Conflict Handling Technology
5	Research On Compilation And Optimization For OpenMP Programs
6	Study On Efficient Indexing For Large Scale Chinese Text Retrieval Systems
7	Research On Chinese Word Segmentation For Large Scale Information Retrieval
8	Research On The Algorithms For String Similarity Search
9	Categories Based E-commerce Navigation System Design And Implementation
10	Rough Set Theory In Knowledge Discovery Applications