Font Size: a A A

Operation Research And Implement Of String Type Data In Column Database

Posted on:2014-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X M HuangFull Text:PDF
GTID:2268330425975858Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Column database system data stored by column is an important direction of nowadays andhas important application value in large data processing fields. In the column database system,database operations are compiled into a series of primitive operations operating on vectors.Primitive operations are managed by the dynamic scheduler and dispatched to multi-coreprocessors to further enhance the performance of database query.This article focuses on the design and implementation of primitive operations aboutString-type data in database query operation. Query performance is improved by optimizing thestorage format of string data in main memory and parallelization of the primitive operations. Themain works of this article are as the following:1) Gives the read method of String data file and the format that data storage in main memory.By reading the header information of data file, Read the data that stored in the data file intoconsecutive memory address space, and defines the corresponding data structure to store thestarting address of the string data and other information. The string data stored in main memory in4-byte aligned manner followed by the order after read.2) Design and Implementation of the primitives operations related to string type. Theprimitives operations include5basic primitives:STRING_Like(search with wildcards),STRING_In(whether the query string matches the list), STRING_Between(whether the querystring in the specified range), STRING_Equal(exact match), SUB_String(extract substring). Thisarticle studied and implemented five primitives were for fixed-length and variable-length stringdata of different operating.3) Gives the application and implementation of AC algorithm used in string search match.AC algorithm is based on finite state machine multi-pattern matching algorithm. Before matchingalgorithm, we construct a finite state machine of all pattern string pattern named for ACautomaton. In the match, AC automaton only need to treat one scan matching string list and youcan find a list of strings matched. AC string matching algorithm complexity is only related to thesize of string list, but the size of pattern strings.4)Gives the parallel implementation of string type primitive operation. For string dataprocessing at the file level and string-level parallel processing and enhance the string primitivequery performance on multi-core processors by divide and conquer approach.
Keywords/Search Tags:Column database, string data, parallel, Aho-Corasick
PDF Full Text Request
Related items