Font Size: a A A

Algorithm Research On Minimum Common String Partition Problem

Posted on:2011-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2178360305451570Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
String comparison is a basic problem in the computer science. It was widely used in the area of genome comparison, text processing and compression et al. In recent years, the minimum common string partition (MCSP) problem has been payed more and more attention by algorithm and complexity researchers. The MCSP problem aims at finding a common partition about the input string A and B with the minimum number of blocks, and each block of A has one correspondence block in the string B after partition. In the research of genome comparison, it must be looked as two strings and compute their minimum common string partition to detemine the relationship between the two genes.To divide string A and B into common substrings, the number of occurrences of each character in the string A must be same as the number in the string B. The problem is defined as k-MCSP problem, when each character occurs at most k times in each input string. 2-MCSP problem is not only NP-hard but also APX-hard. Presently, the approximation ratio of the best approximation algorithm for 2-MCSP problem is 1.1037, and the ratio of 3-MCSP problem is 4, at the same time the ratio of k-MCSP problem is(?)(k).This paper describes the previous findings of MCSP problem, and does a lot of experiment about greedy algorithm, novel greedy algorithm, educated greedy algorithm and hitting set algorithm that have good performance. On this basis, there are detailed description and evaluation for every algorithm.Then a new algorithm-delimitation greedy algorithm is designed for k-minimum common string partition problem. This algorithm is based on the hitting set algorithm and is done to improve. The approximation ratio of the algorithm is O(k), and it can accomplish inO(n). At the same time, a large number of experimental data shows that the delimitation greedy algorithm has better results than the hitting set algorithm. In our actual test experiments, each instance can be completed better by delimitation greedy algorithm than hitting set algorithm.Then there is an optimal algorithm and random backtracking algorithm was designed. The optimal algorithm can compute one or all optimal solution of k-minimum common string partition problem in O((k!)mn), when m is defined as the number of different elements in the input string, and n is defined as the length of the input string. The optimal algorithm is very important for assessment of other approximation algorithm. Random backtracking algorithm has a high flexibility. It can compute the optimal solution, and can get a good solution in a short time according to the actual demands. Both optimal algorithm and random backtracking algorithm have a high practical value. Finally, there have a conclusion and prospects about minimum common string partition problem. The main results of this paper are below.(1) Design and implement a new approximation algorithm to answer the minimum common string partition problem, which is better than original approximation algorithm.(2) Design and implement an optimal algorithm and random backtracking algorithm to answer the minimum common string partition problem, which have a great value to compute optimal solution and evaluate other approximation agorithm.
Keywords/Search Tags:Minimum Common String Partition, Sorting By Reversal, Greedy Algorithm
PDF Full Text Request
Related items