Font Size: a A A

The Research Of Enterprise Search Engine Sorting

Posted on:2017-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2308330482979877Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the development of society and economy, we continue to promote the building of enterprise information, at the same time, enterprise information resources has become increasingly diverse, and enterprise information resources appear scattered distribution and diversification, leading the results that people look for information becomes more difficult; In addition, enterprises information are related to business secrets, which using commercial search engines let enterprises bear the economic risk.Therefore, we conducted in-depth study of the search engine about ranking algorithm, we made some innovation based on traditional search ranking algorithm. Firstly, through the initial vector estimation and the introduction of pre-ranking volatility as the iteration stop criterion, we improve the traditional PageRank algorithm. Secondly,we propose the concept of contribution rate,which is the relationship between query words in history and pages viewed, it is proved to effectively improve the retrieval efficiency and improve customer satisfaction.First, the article introduces the flow chart of search engine and technical knowledge about how search engines work; This article introduces the popular search engine ranking algorithm, this paper focuses on classical sorting algorithm which is PageRank algorithm; In this paper, we study the user behavior on the Internet, the paper focuses on the behavior of users click on the reliability of judgment which is based on multiple characteristics correlated users click on the query; this paper analyzes the open source Lucene rating mechanism, which the basic idea is to compare Lucene page content relevance based on the query term.Then, this paper presents an improved PageRank algorithm and sorting algorithm based on model of the user click behavior. Firstly, we study how the PageRank algorithm work, then we put forward initial pre-estimate vector and introduce the rankings volatility which as PageRank stop iterating guidelines, which reduces the number of iterations and accelerates the iterative process; Secondly, by mining data of the user behavior and analyzing the reliability of user clicks, this paper presents the contribution rate of the history of the search term hits to the document, which is from the perspective of the user behavior affect outcome of the sort.Finally, we conduct experiments and analyze experimental results. By comparing the numerical experiments, we find that the improved PageRank algorithm uses less computing time than the traditional PageRank algorithm; By comparing the precision of search results, we verify that the new algorithm improves the efficiency.
Keywords/Search Tags:Enterprise search engine, user behavior, PageRank, resort, Lucene
PDF Full Text Request
Related items