Font Size: a A A

A Search Engine Based On Search Term Extension And Text Representation

Posted on:2018-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YangFull Text:PDF
GTID:2348330536484897Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Information retrieval,a content-driven application,supplies the search results directly affecting the users whether the rapid access to the required information is available.At present,the vertical search engine oriented at specific domain satisfies the needs of users to obtain the specific information to a certain extent.However,the text-based search engine cannot allow the searching from the semantic point of view,so that the search results are too dependent on the choice of search terms.In view of the above issue,the relationship in knowledge ontologies and the word representation of the text,which puts forward the sorting algorithm based on the expansion of the search term.The main study and research are as follows:(1)Research on the extension method of retrieval words based on Ontology.In Chinese Wikipedia as a carrier,the extracted data from the fixed page structure automatically build the knowledge ontology on a regular basis.After the ontology data is persisted to the ontology storage engine,the query service can be provided,the description data of the ontology and the ontology set with the associated relationship are returned.The description of the ontology is used as the display and supplement of the search results,and the hyponym and the correlation of the ontology are used as the expansion basis of the search term.(2)Text Representation based on word embed and similarity Computation.Word2 vec is used to train the Chinese data of which the result vector judges the similarity among the text to find the similar word set of the search term.Meanwhile,vectorized titles of the library document are to be set weight on a certain principle.As per the users' behavior,the personalized document recommendation based on the linear operation of vectors is finally accomplished.(3)The Dscore ranking algorithm based on retrieval results of Lucene.Aiming at the application and scenario and combining with the search term expansion of the word vector and the semantic similarity computation,the Dscore sorting algorithm is proposed based on the expansion of the search term and the retrieval result.To design and implement of the retrieval system,and to complete the system test and evaluation of the retrieval results.Combining with the transformation of specific application,the research results from the "dayinyun.cn" project in which the retrieval system is undertaking the task of shared document retrieval.
Keywords/Search Tags:Search engine, ontology, word vector, Lucene, Dscore sorting algorithm
PDF Full Text Request
Related items