Font Size: a A A

Research On Keyword Search In Relational Database

Posted on:2019-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:1368330575961956Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology,the amount of information has shown explosive growth trends.As the primary way of organizing,storing and managing structured data,databases have already been widely used in many fields such as business intelligence,enterprise resource management,and even personal daily life.The demand for query access to databases is growing rapidly.The traditional structured query methods(such as SQL statement)require the users to master the professional query language and be familiar with the complex underlying schema of the database,which brings many difficulties and inconveniences to the database query access works.Therefore,on this background,the keyword query technology in relational data has come into being and attracted widespread attention from database communities and researchers.Due to the emergence of this query technology,the users only need to provide some keywords to complete the query work,which greatly reduces the usage threshold of the database and has a good application prospect.In recent years,a large number of methods have been proposed for querying in the relational databases.However,since relational databases have different structural characteristics from Web and traditional IR systems,researches in this area still face many difficulties and challenges.For example,the lack of query preprocessing mechanism which can effectively fill the information gap between unstructured queries and structured data;frequent multi-table join operations in query algorithms make query efficiency can not be well guaranteed;lack of an efficient ranking mechanism which can achieve automatic ranking of query results.Therefore,in view of the shortcomings of the keyword query methods in the relational database,in the purpose of improving and developing the existing methods,this paper addresses several problems in the keyword query methods and the application systems from the perspectives of schema summarization,query expansion,query optimization,and result ranking.Overall,the research in this paper mainly includes the following four aspects:Firstly,aiming at the problem that the current enterprise databases are complex in structure and large in scale,while the existing schema summarization methods consider only single factor and achieve low accuracy of the summarization results,we make some research on the schema summarization method of relational database.A schema summarization method based on graph partition strategy GP-RDSS(Relational Database Schema Summarization based on Graph Partition)is proposed to help users quickly and accurately grasp relevant information in large databases.Specifically,(1)We construct the similarity matrix between tables from the aspects of structural compactness and content similarity,and adjust it by mining the query log information,so that the measurement is comprehensive and more reasonable;(2)The concepts of intrinsic importance and dependence importance are put forward,and a metrics scheme is formally defined,which can accurately measure the importance of the tables;(3)We come up with a schema summarization algorithm which combines the graph partition mechanism with the characteristics of relational databases,and the influence of user query preference on the schema summarization process is considered for the first time,then the schema summarization result is further improved.(4)Experiments on the data set TPC-E verify the validity and feasibility of the proposed method by comparing with the existing schema summarization methods.Secondly,the keyword query is semantic fuzziness and its expressiveness is limited due to the lack of structure,to solve these problems we propose a query expansion method Re Interpret QE(Query Expansion Based on Recommendation and Interpretation)based on keyword recommendation and query interpretation.Specifically,(1)In the query recommendation phase,the query recommendation model is constructed based on the term similarity matrix and the dynamic programming idea,and the original query can be extended to keyword query list;(2)In the query interpretation phase,we perform the translation from keywords to query subgraphs(the subgraphs not only contain the content information related to the original query semantics,but also carry the potential structural information between keywords)based on the database statistical information and the schema graph;(3)Experiments are carried out on the public data set DBLP,the experimental results confirme the rationality and effectiveness of the proposed query expansion method.Thirdly,the existing keyword query methods in relational databases need to perform online table joins in the the process of queries,which results in low query efficiency and cannot be applied to large-scale databases.To handle this problem,we propose an offline method TCU-Based query(Query Based on Topic Cluster Units),which is suitable for large-scale relational databases with complex schemas and enormous data.Specifically,(1)We give a formal definition of the topic cluster units by vertically grouping and horizontally grouping the data tables and tuples,then we offline construct the topic cluster units and return them to users;(2)We design a table join optimization scheme based on the genetic algorithm,which can reduce the preprocessing time overhead,and we present an optimization mechanism of index based on the association rule algorithm to improve query efficiency.(3)Experiments are performed on the public data set Freebase,the results show that the proposed method is significantly better than the traditional keyword query methods in the aspects of the efficiency and effectiveness.Finally,in order to solve the problem that the influence factor weights in the traditional ranking methods need to be set manually which leads to low accuracy,we introduce the learning-to-rank model into the relational databases,and a parallel learning ranking method PARR-H(Parallel Ada Rdb Rank-Hierarchy)is proposed.Specifically,(1)We construct a global feature-relationship graph,then we suggest a hierarchical construction strategy of weak rankers based on it;(2)We present a listwise learning-to-rank algorithm ARR-H(Ada Rdb Rank-Hierarchy)and then add a parallel framework on it to construct the parallel learning-to-rank framework PARR-H,which provides a harmonious balance of the ranking accuracy and training efficiency.(3)Extensive experiments are carried out on datasets OHSUMED,WSJ and AP respectively,the experimental results demonstrate that the learning-to-rank algorithm PARR-H achieves great improvements in both ranking effectiveness and training efficiency.
Keywords/Search Tags:Relational database, Schema summarization, Query expansion, Keyword query, Learning-to-rank
PDF Full Text Request
Related items