Font Size: a A A

Research And Implementation Of A Time-based Focused Search Engine

Posted on:2010-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X SunFull Text:PDF
GTID:2178360302959582Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the search engine provides a portal for people to access to information in the face of the largest treasure of information resource, and it has been widely popular. However, due to the exponential growth of the diversification of Web information and people's needs, the traditional search engine can no longer meet people's requirements of personalized information retrieval service. Therefore, the focused search engine comes into being. Different from the traditional search engine, focused search engine only focuses on certain areas to provide more accurate, comprehensive and real-time service for users in specific areas. Some techniques of focused search engine are similar to the ones of the traditional search engine, but it has some unique techniques and some new issues that need to be resolved. Therefore, the focused search engine has become a new hotspot in recent years.The traditional focused search engine only supports the keyword-based search, so it is difficult to effectively express the user's query in many cases. Temporal information is the essential attribute of a web page, such as the modification time, the news time contained in a news page and so on. Therefore, if it is able to take advantage of temporal information in pages to enhance the efficiency of the focused search engine and to allow users to express the temporal-related needs, the performance of the focused search engine can improve effectively.In this thesis, we carry out the key technology research around the focused search engine which is based on temporal information. Our work lays emphasis on the issues of design and implementation of the temporal focused search engine, focused crawler and search results temporal ranking. The main contribution of this thesis can be summarized as follows:(1)Propose and implement a hybrid focused crawler by analyzing the structure of webs and the characteristics of Web pages. First, the pages, which the crawler crawls down, are calculated the relevance of the themes and the web pages using the web analysis algorithm based on the VIPS, and the relevant link is chosen. Second, we improve the crawler's ability to across the Web communities combining with meta-search technology, in order to make it accurate and with a good recall.(2)Study the search results ranking algorithm which combines with different page time. We proposed three types of ranking algorithms suitable for different temporal search. These types of algorithms improve the original PageRank algorithm on transition probabilities and the random Jump probabilities respectively for the user's requirements of the content time, the modification time or both types of time, in order to enhance the accuracy of the ranking results.(3)Design and implement a focused search engine which can retrieve Web pages according to the content time and the modification time. The system can support text retrieval and temporal retrieval at the same time. The experimental results show that the focused search engine based on temporal information has a better query expression and query processing capabilities than that based on text keywords.
Keywords/Search Tags:focused crawler, focused search engine, temporal ranking, temporal information retrieval
PDF Full Text Request
Related items