Font Size: a A A

The Research And Implementation Of Search Engine Prototype System With Gap Constraint

Posted on:2015-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X H GeFull Text:PDF
GTID:2298330452994185Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The search engines are born for people who need to find the information fast,accurately in the vast resources of the Internet, and it is also a very good product of therapid development of Internet and Age. Although the current search engines have had avery mature and very perfect retrieval mechanism, but it still have various defects in thesearch result, that is search engines do not support the search with gap constraint, the gapis constrained search is very practical significance.Pattern matching problem is also called string matching problem, and it is one typicalproblem of the fundamental problems in the computer science field, and it has theimportant applications in the computer important core fields. And with gap constraintpattern mining is an important research content of pattern matching. For researchers havedevoted to study the problems, they are like this P=p0[min0, max0]p1...[minj-1, Maxj-1]pj...[minm-1, maxm-1]pm, minj-1and maxj-1refer to the minimum and maximum clearancedistribution between pj-1and pj.Accordingly, because the search engine in the text hasdefects, this paper combined the inverted index and span query (SpanQuery) to improvethis defect. Because SpanQuery algorithm have limit, And combined with two differentstorage modes design two algorithms which are better than the SpanQuery algorithm it isABSQ algorithm.ABAS algorithm(the improved algorithm based on array storage)andABKS algorithm (the improved algorithm based on key-value storage), in addition,achieve the algorithm which is the same with ABSQ, it is RRSA (Recently the Right ScanAlgorithm).The experimental results show that, under the condition of less the indexed files andthe contents of the files, when we compared with RRSA algorithm and ABSQ algorithm,the results show that the RRSA algorithm is better than ABSQ at run time, but as theindexed files and the contents of the files increase, the ABSQ algorithm is better thanRRSA algorithm at run time. While when we compared with ABAS and ABKS algorithmand ABSQ algorithm, there are not much difference on time, the computational results ofthe two improved algorithms are better than the ABSQ algorithm. And when we comparedwith ABAS algorithm and ABKS algorithm, the computational time and the results have not obvious difference, The experiments show that ABKS is relatively better algorithm,Finally, the algorithm used in the search engine system, and have achieved good results, toprove the objectivity and feasibility of the algorithm, and also prove the rationality of thesystem.
Keywords/Search Tags:search engine, space constraints, gap algorithm, Lucene
PDF Full Text Request
Related items