Font Size: a A A

The Optimization Research Of Keyword Search On Relational Database

Posted on:2020-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2428330578467006Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the period of big data,the amount of data stored in the database continues to increase,and the relational database keyword search technology is more and more widely used.The traditional access to the database requires the user to master the special database query language and the underlying database schema structure,and in the query process,the user often needs the result that multiple tables are connected,then the query is highly complex and difficulty,so the research on keyword search technology based on relational database has been very hot.The traditional relational database keyword search system uses the exact matching method when querying the keywords input by the user.Therefore,the result expected by the user may be lost during the query process,resulting in low accuracy of the feedback to the user;At the same time,there are a large number of candidate networks with redundant structures in the retrieval process,which results in low retrieval efficiency.Aiming at the low precision and low efficiency of the traditional relational database keyword search system,this thesis proposes a POS Tagging-Candidate Network Score with POS Tagging.The specific work and contributions of this thesis are as follows:(1)Word-of-speech acquisition of keywordsIn view of the low precision caused by the user's query focus in the search process,this thesis uses the part-of-speech tagging tool to obtain the part-of-speech of the keyword,and divides the generated tuple set containing the keyword according to the part of speech.(2)The weight distribution of keyword part of speechIn order to analyze the impact of different part-of-speech keywords on search results,this thesis uses logistic regression to assign weights to different parts of speech,and evaluates the validity of logistic regression models.Finally,it generates keywords with part-of-speech tags.The tuple set is used as the basis for scoring the candidate network.(3)Candidate network scoring algorithm combined with keyword part of speechFirstly,in view of the problem that the traditional relational database keyword search system has a large number of redundant structures,the query efficiency is low.This thesis proposes a candidate network screening algorithm based on the combined network query method,which removes the duplicate structure in the candidate network and uses the labeled The part of speech scores the candidate network.Secondly,since the results expected by users are often generated in a few candidate networks,this thesis proposes a candidate network scoring algorithm based on Bayesian network probability model,which uses Bayesian probability model to score candidate networks and eliminate redundant processes.The score obtained in the score is the final score of the candidate network,and a tuple connection tree is generated,and the query is performed in the database to obtain the result and returned to the user.Through the extensive experiments on real data sets,the proposed algorithm is more efficient and effective than the traditional relational database keyword search system.
Keywords/Search Tags:Relational Database, POS Tagging, Keyword Search, Database Schema
PDF Full Text Request
Related items