Font Size: a A A

Study On Sequential And Parallel String Matching Processing For Urdu Language Texts

Posted on:2019-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:Mirza Baber BaigBKFull Text:PDF
GTID:2428330542994695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
String matching is one of the essential problems in computer science.The language used in Pakistan is Urdu.Urdu language text is completely different from English language text.Urdu text has its own text(?)features.For Urdu language,letters{(?)}are all relevant characters for collation.For Urdu language texts,its characters are encoded by utf-8,and the utf-8 is a length-variable encoding.If we implement string matching algorithms for Urdu language texts by ASCII encoding,the correct matched positions may not be obtained.To our best knowledge,at present there are no reports about studying string matching algorithms for Urdu language texts.It is necessary to study implementing the effect sequential and parallel pattern matching algorithms for processing Urdu language texts.This paper analyzes the characteristics of Urdu language texts and studies the character encoding presentation for Urdu language texts,and knows that the correct matched positions can be obtained when the wchar_t type and unicode encoding is used to process Urdu language texts.And then,this paper studies and implements the sequential algorithms for BM string matching,KMP string matching,BF string matching and Sunday string matching for Urdu language texts,and evaluate the execution performance of these four string matching algorithms on a large number of Urdu language patterns and text strings via experimental testing.Experimental comparative results show that for the Urdu language text strings and patterns of different size,totally,the sequential BM string matching algorithm is the fastest one of four contest algorithms,the sequential Sunday string matching algorithm is the second fast algorithm among four contest algorithms;in addition,with enlargement of Urdu language text,the required time of KMP and BF algorithms is very quickly increased but the required time of BM and Sunday algorithms is slowly increased,the sequential BM and Sunday string matching algorithm have stable performance.Compared to KMP,Sunday and BF string matching algorithms,the BM string matching algorithms by scanning the characters of text and pattern from right to left is more suitable for processing large-scale Urdu language texts.Based on the partitioning principle and the strategy overlapping partial characters in text,the txt[0..n-1]of length n in Urdu text is partitioned in to num_threads sub-texts txt[(i*n/num_threads..{i+1)nlnum_threads+m-1],where m is the length of the pattern pat[0..m-1],i=0-num_threads-1,and num_threads is the number of parallel threads,this paper further studied and implemented parallel multi-thread BM,BF,Sunday and KMP string matching algorithms on the multi-core computers by using multi-core parallel computing and Pthread programming technique.The experimental comparative results on multi-core computer with the Urdu language texts of different size and running parallel threads of different number show that the number of running threads impacts the required time of parallelized BM,BF,Sunday and KMP string matching algorithms,the required time of parallelized BM and Sunday algorithms is much less than that of the parallelized BF and KMP algorithm.Totally,the parallelized BM string matching algorithm is the fastest one among the four parallel string matching algorithms,the parallelized Sunday string matching algorithm is the second fast one among the four parallel string matching algorithms;the parallelized BM and Sunday string matching algorithms running 10 or 8 threads requires the least time to complete the matching work.From the aspect of speed up,the parallel multi-thread BM and Sunday string matching algorithms obtain the highest and second highly speedup among 4 parallelized string matching algorithms.Compared to other three multi-thread parallel string matching algorithms,the multi-thread parallel BM string matching algorithm is more suitable for parallel processing the large-scale Urdu language text and pattern matching.
Keywords/Search Tags:Urdu language texts, utf-8 encoding, String matching, Sequential algorithms, Multi-thread parallel algorithms, Multi-core computing
PDF Full Text Request
Related items