Font Size: a A A

Research On Blog Search Engine Based On RSS And Realized By LUCENE

Posted on:2010-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:S L LiuFull Text:PDF
GTID:2178360272980204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML is an extensive markup language which provides a way of communication between different applications and different platforms. With the extensive use of XML in web applications, RSS has become the most widely used XML application. So far, RSS has been widely used in information services sites, such as news websites and weblogs.Along with the rapid development of the internet, search engine has become a necessary tool for us to get useful information. It is hoped that search engine will provide people a better service while a full range of information is provided. In comparison, blog search engine is very different from the full-text search engine in retrieval contents, operation principle and search technology. As a result of all this, there are some shortcomings, such as inefficient and slow to update, if we search some contents in RSS format via the full-text search engine, and the blog search engine based on RSS overcomes these shortcomings.The thesis studied the operation principle of the blog search engine based on RSS, and mainly studied the blog web crawler and the user's interests model. Web crawler is an important part of a search engine and the quality of the contents it crawled directly affects the search results of a search engine. Because of RSS pages are different from ordinary HTML pages, the blog web crawler based on RSS crawls the RSS links of each blog post. This article researches how to collect RSS feeds, extract RSS feeds and create indexs. On these basis, a blog web crawler which can collect RSS feeds, extract RSS feeds and create indexs was designed and implemented. The traditional search engine serves the masses of users, and it can't return the most relevant results depending on the users' interests. The users often want to get the most relevant results depending on their interests. On this basis, this thesis introduces the concepts and applications of the users' interests model and achieved the users' interests model based on labels and categories of the blog post including the initialization, update and the match with the search results.The Blog search engine based on RSS was designed and implemented. With Ajax, search experience was improved.
Keywords/Search Tags:RSS, search engine, web crawler, Ajax, users' interests model
PDF Full Text Request
Related items