Font Size: a A A

Analysis On The Improved Content Ranking Algorithm In The Search Engine

Posted on:2014-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:J LianFull Text:PDF
GTID:2298330467466763Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and increasing popularity of the Internet, information on the web is explosive on the rise. How to acquire the information from massive information has gradually become a thought-provoking issue we have to reckon on. As a result, information retrieval is becoming one of the hottest fields. Search engine can search and discover some needed information based on certain technique. Although it appears only a dozen of years, its status on the web has become unchangeable.The main research works in this paper can be interpreted as follows:1. The definition and development of search engine are introduced, and then kinds of retrieval models are made description with the evaluation indicators, such as recall rate, precision rate and so on.2. Based on the vector space model invented by Salton, the paper improves its traditional algorithm TF-IDF and introduces Sum of accumulation on TF-IDF, as well as TF-IDF normalization, after that, we make comparison among algorithms TF, IDF and TF-IDF above. In the field of probability model, we take BM25algorithm as the method, combined with search documents and word frequency. As to language model, with the deep realization of Jelinek-Mercer and Absolute Discounting method, we update the Dirichlet Smoothing method.3. After analyzing the existing ranking algorithms, TF-IDF algorithms, BM25algorithms and the algorithms based on language model, we apply all of the above algorithms into results ranking in Lucene. Thus more exactly matching information can be provided to users that increase the searching efficiency. Through testing the efficiency of diverse ranking strategy, it shows that retrieved documents with respect to the query have different ranking accuracies when using different algorithms. In all, language model with updated Dirichlet Smoothing method has the best ranking performance, while TF-IDF normalization algorithm has the better performance, compared to vector space model with algorithm TF, IDF etc.
Keywords/Search Tags:Search Engine, Ranking Algorithm, Lucene, Improved Dirichlet SmoothingMethod
PDF Full Text Request
Related items