Font Size: a A A

Research On Hybrid Query Expansion Technology For Information Search

Posted on:2018-11-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B ZhangFull Text:PDF
GTID:1368330572464545Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of communication and information technology,the Internet has become an important source that people get information,and the search engine is one of the important tools for people to get information from the Internet.In many cases,the existing search engine has been able to provide users with satisfactory results.But at the same time,by the impact of the short query words and the 'word matching' problem,the existing search engines still have the space to improve in the quality of the search results.Query expansion is one of the effective ways to solve the problem of short query and the 'word matching' problem.Existing query expansion methods select query expansion words from search logs,documents or formal semantic resources,and due to the search logs directly recorded the keywords of information needs which users described in history,query expansion methods based on logs can effectively obtain the query expansion words that can completely describe the information needs,and thus gained a lot of attention.However,in the practical applications,the existing query expansion methods also exist a series of problems.Firstly,query expansion methods based on search logs are easily affected by sparsity and the timeliness of data in search logs,so that it can not effectively provide the query expansion words which reflect information needs.Secondly,log based query expansion methodsare easily affected by information popularity in search logs,so that it can not effectively extend the query expansion words which are similar to the user's information needs.Finally,when the search log information is not available,query expansion method based on local context,affected by coverage of the document theme,may not be able to effectively extend important and related concepts in the current domain.The existence of these problems restrict the quality of query expansion.Aiming at the above problems,in order to achieve high quality of query expansion,this thesis studies the hybrid query expansion technique based on multiple data sources.In the light of query expansion methods which based on search logs are easily affected by data sparsity and the timeliness of log data,which leading to the lack of expansion terms or timeliness,this thesis based on selecting sufficient numbers or strong timeliness expansion terms in local context,cover the shortage of selecting query expansion terms from search logs.In the light of log based query expansion methodsare easily affected by information popularity,which leading to incomplete semantic coverage,this thesis studies selecting complete semantic coverage expansion terms from ontology,refining semantic coverage degree of expansion results,and provides expansion terms which is similar to the semantic of the user's information requirement.Aiming at the query expansion method based on local context,affected by coverage of the document theme,unable to effectively extend important and related concepts in the field,this thesis studies the query expansion method of ontology and local context which based on Copula theory,achieving the effective integration of the query expansion terms set from ontology and local context in a unified probabilistic framework,selecting high quality expansion terms in final,improving the retrieval performance of search engine,and better meeting the the user's information needs.In details:(1)Query expansion method based on search log and local context.Aiming at the problem of that search log is easily affected by data sparsity and the timeliness of log data,through selecting high quality expansion terms from the local context,remedy the less quantity and low timeliness of candidate expansion terms which extracted from user search logs.This thesis gives the definition and decision rules of data sparsity and timeliness in search log,the weight calculation method of query expansion terms,and fusion the rules of the two expansion terms,and analysis the retrieval performance of the method through test,and can effectively avoid the influence of log's data sparse and timeliness to query expansion results.(2)Hybrid query expansion method which can balance popularity and similarity.Aiming at the problem of incomplete coverage of semantic expansion,which caused by data popularity of query expansion method that based on log.through selecting the complete semantic extension terms from the ontology,to make up for the lack of similarity information which meet the needs of users;It studies on the method of expansion terms set generation based on search logs clustering,gives the generation method of semantic expansion terms set and semantic category set based on the ontology,and on this basis,propose the coverage degree calculation method of log expansion terms set,and using the evidence theory to achieve the integration of the two expansion terms sets.Experimental analysis shows that this method can provide a complete set of expansion terms semantic coverage effectively,achieve the query expansion which can balance the popularity and similarity.(3)The query expansion method of ontology and local context which based on Copula theory.Aiming at the problem of local context can not extend the important and relevant concepts in the field because of the impact of document theme coverage,it proposes the calculation method of semantic similarity probability and statistical correlation probability,on the basis of Copula theory,achieved effective combination of ontology and local context similarity index in probability framework.Experimental results show that the search performance of this method is better than other expansion method of hybrid ontology and local context,and achieved a higher quality of query expansion.(4)The application of the porposed hybrid query expansion technology to results geological data search system.Based on the above research results,aiming at the lack of support at regional mineral prospecting,regional mineral resources survey,and mineral resources planning task in existing geological data information searching system,the proposed hybrid query expansion technology of this thesis is applied in geological data search system.It is proposed based on a BP neural network and the mapping rules to map the borrower queries to different hybrid query expansion methods,and solved the problem of model parameters setting.The application effect of the method mentioned in this thesis is verified through the actual work task.
Keywords/Search Tags:information search, query expansion, data sparsity, data timeliness, information popularity, hybrid query expansion
PDF Full Text Request
Related items