Font Size: a A A

A Study On Cleaning Of Keyword Query Over Databases

Posted on:2012-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LiFull Text:PDF
GTID:2218330338965394Subject:E-commerce and information technology
Abstract/Summary:PDF Full Text Request
Along with the continuous development of computer technology, databases are used more and more widely in people's work, study and life, such as the banking system, electronic government affairs system, various places needed to deal with mass information, etc. For example, relational database, the most widely used database, where data is stored in a structured model, if user wants to query on the database, he needs to have clear understanding about database model, and grasps certain structured query language, such as SQL. This inquiry is efficient, and the result is very accurate.As the Internet technology and database technology unceasing development and mutual combination, more and more ordinary users need to access the online database to gain information they need. However, traditionally, database access requires users to master certain query language (such as SQL) and understanding database model. But in fact, most users can't master query languages, neither have a clear understanding about the database model. Most users are used to information retrieval style based on keywords query, thus, a natural demand is generated, make database support keyword query.At present, keyword query on databases has become a hot research field, and the direction has some very good results, such as BANKS, DBXplorer etc. These methods have their respective advantages and disadvantages, but actually all methods can be divided into two kinds, data graph based and schema graph based methods. These two methods are similar in core idea, and, assuming the length of a keyword query sequence is n, then the time complexity of above two methods are both O (2n), namely the time complexity increase exponentially. So without prejudice to the premise of for accuracy, if can reduce n, then the optimization of query time complexity has great significance. In addition, keywords chosen directly influences the quality of the query, but, one hand, users can't knows exactly what keywords should be chosen,, on the other hand, there probably exist certain spelling mistakes, so, keyword cleaning before the query can improve the accuracy and time efficiency.This paper does research work on keyword query cleaning, proposes a improved methods of keyword cleaning. The work of this paper are as follows:1,proposes an improved methods of generating the semantic matrix, describe the working principle and significance. Semantic matrix is generated by comparing query sequences with the data in the database after filtering the query sequences, the original semantic matrix just contains the function of correction and grouping, improved matrix consider the synonym and link the matrix height with query sequence length.2, proposed a improved algorithm based on backtracking after the improved semantic matrix was established. This algorithm can clear the sequences and grouping, so it can improve the accuracy and the time performance of querying.
Keywords/Search Tags:query cleaning, semantic matrix, backtracking
PDF Full Text Request
Related items