Font Size: a A A

Study On Keyword Query Performance Optimization

Posted on:2013-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2268330392970768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the Web search engine accepted and used by more and more people, Keywordsearch becomes the most popular and easiest way of information retrieval technologyin searching document and web pages. Promoted by the application requirements,Keyword Search Over Relational Databases (KSORD) turns to be the hot field inrecent years. KSORD enables casual users to use keyword queries (a set of keywords)to search relational databases, without any knowledge of the database schema or needof writing SQL queries, learn and use the Relational databases Query Interface.Keyword-Driven Analytical Processing which is studied in this thesis is one of theimportant research areas of KSORD. KDAP combines intuitive keyword-based searchwith the aggregation power in OLAP (Online-Analytical Processing), which providesa navigational search paradigm to organize the data in a way for discovering the factand data that users interests. Finally, KDAP presents related data to the user in reportform.Even though much research on KDAP has been done and many kinds of KDAPprototypes have been developed, the query efficiency of KDAP is not concerned bypeople. When the number of query keywords increases or the data warehouse schemabecomes complicated, the size of data becomes large, the query efficiency of KDAPas the number of query keywords increases or the data warehouse schema becomescomplicated, the query of KDAP will be inefficient. In this thesis, we analyze thequery of procedure in KDAP in detail base on Schema-graph-based KSORD, whichincludes two phases: generating Candidate Subspace and constructing CandidateFacet. For any user query, the system generates Candidate Subspace temporarilythrough a breadth-first traversal of schema graph. As the number of query keywordsincreases or the data warehouse schema becomes complicated, time spent ongenerating Candidate Subspace may increase rapidly. In addition, every dimensionattribute and attribute instance need to be sorted in current facet construction process,and aggregation over the subspace associated with a given keyword query. This can bequite expensive on sizable data warehouses and too many attribute instance.In this thesis, we focus on the performance of KDAP. Firstly, we extract the system architecture and its query model. Secondly, we analyze the efficiency of queryproblems existing in the query process. Finally, some methods are proposed toimprove the efficiency and effectiveness of KDAP systems. The main contributions ofthis thesis are as follows:Firstly,A novel pre-processing technology based on Candidate-Subspace-Schema (Gcs) is proposed to improve the generation efficiency of Candidate Subspace.It first pre-processes the database schema graph to generate Candidate-Subspace-Schema set, and then stores these Gcss into database. When a user issues a keywordquery, the proper Gcsare directly retrieved from the database. In this way, thegeneration efficiency of generating candidate subspace can be improved dramaticallyso that the efficiency of KDAP can also be improved.Secondly, A novel optimization technology of constructing facet is proposed,which can select the most promising attribute instance facet (AIF) to be executed. Itfirst defines a way to partition a Candidate-Subspace.The result of partition is agroup-by set which can be viewed as a document collection, each of attribute instancefacet in the group-by set can be regard as a document. Thus, the similarities betweenthe query and AIF are computed by employing Vector Space Model (VSM), and onlythe most promising AIF to produce top-k results are picked out and executed. Thistechnology can reduce the number of AIF to aggregate in the subspace apparently sothat improves the efficiency of KDAP.Finally, this study the technology used in the search server in users traffic queryand settlement system is part of the results of the search process. The optimized querymethod greatly improves the system efficiency, there are certain practical significance.
Keywords/Search Tags:OLAP (on-line analytical processing), Keyword Search, Preprocessing, Query Efficiency, Vector Space Model
PDF Full Text Request
Related items