Font Size: a A A

Study On Models To Solvepattern Matching With Wildcards And The Variable Length Constraint

Posted on:2017-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H WangFull Text:PDF
GTID:1318330512468664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The pattern matching with wildcards is one of the important research in the pattern recognition and is received wide attention in the field of computational biology, information retrieval, network security and so on. The special characters and wildcards to match any character in the alphabet, can bring some flexibility has also brought a lot of complexity, thus forming a wildcard pattern matching is a new research direction, a large number of studies in the above field show that frequent co current in a text area is represented by a certain pattern. For example, in the DNA sequence, TATA sequence promoter often appear in the CAATCT sequence downstream of 30-50 interval a wildcard, which is not a simple repetition.The patterns formed by the two sub sequences can improve the specificity of the sequence, which can be labeled as"...CAACT[30,50]TATA...”. The situation is further extended to the sub pattern sequence set, in which two adjacent sub pattern intervals within a certain length, for this kind of flexible interval position, will refer to a single character from the wildcard expansion to refer to a certain length of the string, called the wildcard limit (Bounded length gaps) a sky. In addition, in order to ensure the stability of the sub pattern sequence set by introducing one-off constraints, to avoid the abnormal individual pattern matching times affect the pattern set.It has been called a wildcard variable length constraint pattern matching problem The solution of the complexity of the structure, characteristics and requirements of the accuracy of the PMWL problem, mainly from three aspects to highlight three combination of completeness, its associated algorithm of approximate and similarity measure and Application and other issues, a series of related research work, the main research results are summarized as the following three aspects:(1) According to the complexity, accuracy and completeness of the solution of PMWL problem, there is a lack of targeted model solving strategy for the existing research results. To this end, draw lessons from the constraint satisfaction problem framework (CSPs), first of all, to build the PMWL problem three tuple solving model. The constraints of the model of the problem and solution space and other basic concepts are described, and 8 special cases of the problem known as the unified basic nature of the problem, including in the special condition of completeness and adjacent matching solutions position in the text. In addition, in order to solve the problem of limited length vacancy, the solution structure of multi branch is formed by the matching problem,and the optimal solution of the variable length constraint is searched for the combinatorial optimization problem. At the same time, a FIN algorithm is proposed to solve the PMWL problem.In the first definition of understanding partition boundary space, proposes the solution space partitioning algorithm using a divide and conquer strategy, the problem of PMWL equivalence is divided into several sub problems, and illustrates the division before and after the solution structure equivalence in theory. The experimental results show that the FIN algorithm can not only get a matching number, but also can obtain a complete matching solution. The algorithm can form complementary with the existing heuristic algorithm, which can effectively reduce the loss ratio of the matching solution.(2) According to the research progress and achievements in the second and third chapter pattern matching algorithm, matching on the basis of the traditional approximate string, and the DP and SAIL-APPROX dynamic programming algorithm in the main features of carding and Analysis on the existing do not have complete flexibility and the solution quality is not high,easy to lose solution problems with the traditional algorithm, proposed a heuristic algorithm W-DPBI.The algorithm of text search inversion optimization strategy and process to take, finally compared with the similar DP and SAIL-APPROX algorithm, the results show that the algorithm gets an average growth rate of solution is 21.9%, up to 57%, In certain circumstances, the quality and the ability of the result can be improved obviously, and it has good flexibility and inspiration.(3) Used the pattern matching algorithm and its application in computational biology and related experimental methods for drug research, gene and disease sequence similarity in structure characteristics, collecting data on the information source, using the collaborative filtering algorithm and approximate matching algorithm for solving combinatorial search and related strategies used in drug repositioning model research and construction. The results show that the method can significantly improve the concentration degree of the drug - disease. Compared to the existing classification model and random sampling results, can effectively reduce the false positive rate prediction, the model parameters can be used as a drug test reference, at the same time show that the algorithm in DNA gene pattern recognition, information search and field research, but also has good operability and practical application of elasticity.
Keywords/Search Tags:pattern matching, wildcard, length constraint, similarity measure
PDF Full Text Request
Related items