Font Size: a A A

K-Prefix Tree Full-Text Search Method And Application

Posted on:2010-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ChenFull Text:PDF
GTID:2178360275994353Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,how to fast drive most useful information from the flood of information is the most important research subject of message retrieval.The full text retrieval which is one of the major search content of message retrieval,is to resolve the issue that how to drive information apace frequently and changefully from massive information.The current widely used suffix tree and suffix array full-text search method have their own characteristics and limitations in search space and computing speed.As to resolve this problem,this paper describes an full-text search method called K-Prefix Tree full-text search method.K-Prefix Tree full-text search method is a full-text search method which is based on prefix tree and can search the text with length not greater than k.The main characteristic of K-Prefix Tree is creating prefix tree with k-substring.So,it's maximum space complexity is O(Zk+1) and it compromise the search space and computing time advantages of suffix tree and suffix array.To avoid large occupation space like Suffix Tree,this method imports the Suffix Array's practice of reducing the inside node of Suffix Tree,and use the K-substring to construct tree instead of suffix, so compare to Suffix,it overwhelmingly saves large space of Tree..By comparison with the widespread use full-text methods of suffix tree and suffix array,it is verified that K-Prefix Tree full-text search method has well combination property in search and computing time.As one of the basic and important mission of bioinformatics,vector recognition plays an important part in sequences contaminate removing and finding the cDNA inset.Applying K-Prefix Tree to vector recognition is very important and meaningful to biologist.This paper describes the vector structure of EST sequence which based on the EST expected structure.Combined the vector structure of EST sequence and K-Prefix Tree full-text search method,a vector identification method of EST sequence which based on K-Prefix Tree is proposed.The main characteristics of this method are creating K-Prefix Tree based on vector structure of EST sequence and matching,extending,merging k-substring using K-Prefix Tree.It was approved that the K-Prefix Tree vector recognition method is feasible and effective.By the experiment and analysis of the representation,K-Prefix Tree is not only doing with less time but also doing with less space,and will be helpful in study of message retrieval especially full text retrieval.
Keywords/Search Tags:K-Prefix Tree, Suffix Tree, Vector Recognition
PDF Full Text Request
Related items