Font Size: a A A

Application Of The Entropy Theory In Search Engine Quality Evaluation

Posted on:2013-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:D H WangFull Text:PDF
GTID:2218330362458829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Search Engine is a hot topic nowadays in the Internet world. The work to evaluate a Search Engine can be traced back to the Information Retrieval Framework based on the concept of the"Precision and Recall"that Kent invented in 1955 and upon which the Evaluation Scheme that was brought forward by the British Cranfiled Project. Up to date, Search Engine is mature in its 3rd generation based on sematic analysis and in its flushing period of the 4th generation centered on personalization and socialization. Its evaluation framework has not had an essential change in decades. Based on the broad research of the technologies and Quality Evaluation Frameworks for Search Engines it is concluded in the article of the issues: 1. It is manual, labor-intensive, and high cost of the current evaluation process that it can't effectively direct Search Engines to improve; 2. The representativeness of the Query sample is hard to control; 3. Every Search Engine has been improving against the Framework, diminishing return has been observed, e.g. the improvement to the relevance has less and less ROI.Through broad reading the literatures in the area and deep thinking of the problem, this article puts forward a method based on the negative feedback theory, which applies the concept of the Entropy to the analysis of the click information, and quantifies the importance of DSATs to a Search Engine. It is proved through the results of experiments that the method is an effective supplement to the present Quality Evaluation Framework for Search Engines. The article also mentioned the classification of DSATs to mDSAT and 1DSAT, which facilitates varying measures to be taken according to their unique characteristics. Because the method emphasizes on the automatic process of mining DSATs based on Click Log and prioritizing them upon Click Entropy value, it will mitigate the impact of the issue 1 mentioned above; Click Logs are collections of all click information, theoretically the sampling set of the method includes all queries from all end users, it will greatly reduce the uncertainties that the issue 2 above derived; the method supplement the present Quality Evaluation Framework from a new angle, automatic DSATs mining and prioritization, it should improve Search Engine quality in some short terms, and any improvement against the method is direct improvement to the quality itself, it will take some time to see diminishing return again.The innovative portion of the method is centered on the application of the concept of Entropy to the analysis of the Click information. Through the extension of the concept of Click Distribution, the article evolves the definition and formulas of the Click Event and Click Entropy. It develops a new yet practical method to interpret click information resulting in quantifying the importance of DSATs to Search Engine Quality. The article also solved the issue of invalidity of the no-Click and low-Click results. Then based the Negative Feedback theory in Automatic Control System the article puts forward a framework to utilizing DSATs as the negative feedbacks to Search Engines in order to improve its quality in a direct and effective way.Based on the above mentioned set of ideas, the article discusses a framework, while automatically calculating the Click Entropy in real time, it realizes the automatic mining of DSATs under real time and batch mode. The requirement, design are detailed with data flow charts of the Click Data in Click Logs and detailed description of aggregation process of Query/URL pair. The implementation of the framework is introduced with highlight on Index Builder and RealTime Report Generator. For easy comparison and understanding the method normalizes the Click Entropy that helps automatically prioritize DSATs. In the end, through testing and experiments, it is proved that such application of the Entropy concept can effectively improve Search Engine quality.
Keywords/Search Tags:Search engine, Negative feedback, Information entropy, Click distribution, Click event, Click entropy
PDF Full Text Request
Related items