Font Size: a A A

The Research And Implementation Of Search Engine Based On The Temporal Information

Posted on:2014-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ZhangFull Text:PDF
GTID:2248330398457389Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet has become an indispensable part of our modern life, which is also one of the most important source of information. At the same time, the massive data growing in the internet also brings new challenges to the users, laced with such a vast amount of information resources, the users may feel difficult to quickly find the information they need. In order to solve this problem, search engine is bored. According to certain strategies and specific procedures, search engines are used to collect and process information from the Internet, and provide searching services to users, In this way, the relevant information which the users wanted to search are showed to the user in a straightforward way.The rapid development of the search engine has brought great convenience to users, but it still cannot solve the problem in essence, the existing search engines retrieve web pages by the way of keyword matching, often, too many result web pages are returned, which contains a large number of web pages that is useless to users, the users are still very difficult to get the needed web pages quickly and accurately.Time is one of the essential features of information, and also one of the essential attributes of web pages. When people read a news report, they always put content and time together, some information only at a specific time to be meaningful. Adding temporal elements when searching information can show the querying intention of the user more accurately, so that the search engine can find the information needed by the user more quickly and accurately. So the research of temporal information is very important issue in search engine system. More and more search engine systems have introduced the temporal information searching. Temporal information searching is available in the advanced search in the search engine system like Google and Baidu. But they provide temporal information search only for the time of the page was updated or the page was fetched, and they did not take the time of web content into account.This paper firstly introduces the basic knowledge of the search engine. Then the temporal information extraction function is added to the Nutch search engine, firstly, extracting temporal information from the text content of the web pages, then the extracted temporal information need to be standardized and reasoned, and then the results from the second step are stored in the index files. The Algorithm named CTRR which is used to calculate the relevance score of temporal information is proposed in the third chapter of this paper, the CTRR algorithm is used in the search ranking stage, the returned searching results are processed twice, then the ranking score of the returned results are recalculated, at last the returned results are ranked according to the sum of the relevance score of temporal inlbmiation and the relevance score of text content.
Keywords/Search Tags:Temporal Information, Search engine, Information Extraction, Sort
PDF Full Text Request
Related items