Font Size: a A A

Research On Query Expansion In Database Keyword Search

Posted on:2015-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:P H LiFull Text:PDF
GTID:2308330464963248Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important research direction in database keyword search, semantic search in database has been widely concerned in recent years. Compared with precise search, semantic search can discover potential results and return them to users, on the other hand, users can also provide more flexible queries. Latent Semantic Analysis (LSA) is a widely used approximate query processing techniques. This method uses singular value decomposition (SVD) to decompose the associated matrix of words and documents, and it uses the lower order approximation matrix to dig potential relationships between words and words, at last, it uses the cosine similarity to measure the similarity between the query and documents. LSA not only meets the needs of semantic search, but also solves the problem of synonymy. However, the offline singular value decomposition and online query processing of LSA is very high, and it will increase significantly with the increase in the size of the document sets so it is difficult to meet the needs of large data sets.This paper focuses on the efficiency of LSA offline processing as well as online query processing efficiency and proposes appropriate solutions, the main works are as follows:1. Analyzed the main factors that affect the online query processing efficiency, and based on the sparse characteristics of the decomposition space, we proposed an index structure to store the result matrix of SVD.Based on the index structure, we proposed a fast online processing algorithm which is IndexLSA. Theoretical analysis shows that, for the same query, IndexLSA and LSA can return the same sequence of results.2. Generated a Pruning Index by setting threshold. And we use the Pruning Index in IndexLSA. IndexLSA’s online query time is affected by the size of the index and by setting a threshold, pruning index can shrink the size of items in index so that IndexLSA cam improve the efficiency of online processing. Theoretical analysis shows that, though Pruning Index has brought some errors, it is in the foreseeable range. Experimental results on real data sets show that, IndexLSA can not only greatly reduce the LSA online query time, but also provide acceptable results.3. Combined linkage clustering algorithm, we proposed Link-LSA algorithm and by using pruning index, we finally proposed the Link-IndexLSA algorithm. Link-LSA examines the relationship between different data links between entities in heterogeneous networks and measure the entity similarities through linkages. The experimental results on real data sets show, Link-LSA can not only greatly reduce the singular value decomposition of the time, it can also effectively improve recall rate of LSA. On the other hand, Link-Index LSA improved the online query processing time of Linked-LSA.
Keywords/Search Tags:Latent Semantic Analysis, Pruning Index, Threshold, IndexLSA, Linkage-based clustering algorithm
PDF Full Text Request
Related items