Font Size: a A A

Research On Performance Optimization Of Keyword-Driven Analytical Processing

Posted on:2012-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2218330338462750Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Web search engine accepted and used by more and more people, Keyword search becomes the most popular and easiest way of information retrieval technology in searching document and web pages. Promoted by the application requirements, Keyword Search Over Relational Databases (KSORD) turns to be the hot field in recent years. KSORD enables casual users to use keyword queries (a set of keywords) to search relational databases, without any knowledge of the database schema or need of writing SQL queries, learn and use the Relational databases Query Interface Keyword-Driven Analytical Processing which is studied in this thesis is one of the important research areas of KSORD. KDAP combines intuitive keyword-based search with the aggregation power in OLAP (Online-Analytical Processing), which provides a navigational search paradigm to organize the data in a way for discovering the fact and data that users interests. Finally, KDAP presents related data to the user in report form.Even though much research on KDAP has been done, and many kinds of KDAP prototypes have been developed, the query efficiency of KDAP is not concerned by people. When the number of query keywords increases or the data warehouse schema becomes complicated, the size of data becomes large, the query efficiency of KDAP as the number of query keywords increases or the data warehouse schema becomes complicated, the query of KDAP will be inefficient. In this thesis, we analyze the query of procedure in KDAP in detail base on Schema-graph-based KSORD, which includes two phases:generating Candidate Subspace and constructing Candidate Facet. For any user query, the system generates Candidate Subspace temporarily through a breadth-first traversal of schema graph. As the number of query keywords increases or the data warehouse schema becomes complicated, time spent on generating Candidate Subspace may increase rapidly. In addition, every dimension attribute and attribute instance need to be sorted in current facet construction process, and aggregation over the subspace associated with a given keyword query. This can be quite expensive on sizable data warehouses and too many attribute instance.In this thesis, we focus on the performance of KDAP. Firstly, we extract the system architecture and its query model. Secondly, we analyze the efficiency of query problems existing in the query process. Finally, some methods are proposed to improve the efficiency and effectiveness of KDAP systems. The main contributions of this thesis are as follows:1. A novel pre-processing technology based on Candidate-Subspace-Schema (Gcs) is proposed to improve the generation efficiency of Candidate Subspace. It first pre-processes the database schema graph to generate Candidate-Subspace-Schema set, and then stores these Gcss into database. When a user issues a keyword query, the proper Gcs are directly retrieved from the database. In this way, the generation efficiency of generating candidate subspace can be improved dramatically so that the efficiency of KDAP can also be improved.2. A novel optimization technology of constructing facet is proposed, which can select the most promising attribute instance facet (AIF) to be executed. It first defines a way to partition a Candidate-Subspace.The result of partition is a group-by set which can be viewed as a document collection, each of attribute instance facet in the group-by set can be regard as a document. Thus, the similarities between the query and AIF are computed by employing Vector Space Model (VSM), and only the most promising AIF to produce top-k results are picked out and executed. This technology can reduce the number of AIF to aggregate in the subspace apparently so that improves the efficiency of KDAP.
Keywords/Search Tags:OLAP, Keyword Search, Preprocessing, Query Efficiency, Vector Space Model
PDF Full Text Request
Related items