Font Size: a A A

Study On Parallel Algorithms For Approximate Multi-Object And Multi-Pattern Matching On Heterogeneous Cluster Systems

Posted on:2009-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:C FanFull Text:PDF
GTID:2178360245467732Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
String matching is one of the basic and important research problems in computer science. Multi-object matching and multi-pattern matching are important issues in string matching field. Exact multi-object matching and multi-pattern matching require that the object strings or the substrings of text match completely with the query string or patterns and don't allow any error. But in many applications, the object strings or the substrings of text can match with the query string or patterns allowing some errors. Since the size of text or the mumber of object strings is often very large, it is necessary to design highly efficient parallel algorithms for approximate multi-object and multi-pattern matching. The cluster computing systems have the characteristics of high performance, low cost and scalability. This paper studies to design and implement highly efficient parallel algorithms for exact and approximate multi-object and multi-pattern matching on heterogeneous cluster systems that processors have different computing speeds and communication capabilities and memory sizes.Based on the perfect hash function constructed by Chinese Remainder Theorem, a parallel algorithm for multi-object string matching on the cluster computing systems is presented by applying the Karp-Rabin pattern matching idea and the filtering technique. This algorithm maps a given string into a unique pair of integer values and implements the parallel matching process by comparing the pair of integer values for the query string and object string instead of comparing directly the query string and object string. The theoretical analysis and experimental results show that the presented parallel algorithm obtains the determinate match results and it is concise, efficient and scalable.By taking into account computation and communication loads and by applying the assigned processor distribution order and the divisible load principle, two linear programming models for optimal object string distribution are presented on the single-tier and double-tier tree heterogeneous cluster systems that processors have different computing speeds and communication capabilities and memory sizes. The experimental results on the cluster of heterogeneous personal computers show that the parallel algorithm for approximate multiple object string matching with the presented optimal distribution method is superior to one withthe even distribution strategy, and obtains a nearly linear speedup and good scalability.For the given multiple patterns and a text string, firstly, every pattern is transformed into a unique pair of integer values by the perfect hash function, these integer values are stored in a hash table, and a recursion expression for computing hash function value of signatures of each window of sub-string of text is also proposed. Secondly, based on divisible load principle, a linear programming model for the optimal text distribution strategy is created and a parallel approximate multi-pattern matching algorithm allowing one error is presented on the heterogeneous cluster system which processors have different computing speeds and communication capabilities and memory sizes by taking into account computation and communication startup time and by the assigned processor distribution order. The experimental results on the cluster of heterogeneous personal computers show that the presented parallel multi-pattern matching algorithm obtains a nearly linear speedup and good scalability, and it is averagely 25% faster than the one using the even text distribution strategy.
Keywords/Search Tags:Approximate Multi-Object Matching, Approximate Multi-Pattern Matching, Heterogeneous Cluster Systems, Parallel Algorithms, Hashing, Divisible Loads, Limited Memory
PDF Full Text Request
Related items