Font Size: a A A

Research Of Automotive Information Search Engine Based On Personalized Service

Posted on:2012-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:X S FanFull Text:PDF
GTID:2178330335452255Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid spread of Web2.0, it brings to people abundant information,but also make people greatly decreased ability to grasp the information, a huge amount of information beyond the idea of people expected. Traditional information retrieval system could not meet the needs of users, the emergence of general search engines has meet the basic needs of general users, but it also exist insufficient in the particular area search and personalized search services for users.in this paper, the search engine system of car subject based on personalized services is to patch up these deficiencies.This paper introduces the research background and the work principle of search engine firstly, then analysis the system construction of Heritrix web crawler deeply, expending the associated components of Heritrix to design web crawler of car subject; Introduce URL hashing algorithm ELFHash to change the original allocation strategy of Key value of Heritrix web crawler, making the crawler could crawl the pages under the same domain multi-threadly,achieving the goal of multi-thread and efficient web crawl of car theme web crawler.The paper use the full-text search framework as the search framework of the system. Introducted the basic principle of the full-text search and relevant technolohy of Lucene:indexing techonlogy, sorting technology, and put forward the deficience of the original sort algorithm in Lucene, it is only accroding the importance of the web pages to sorting, this method could not reflect the importance of web pages objectively, so introduced Google's PageRank algorithm based on the original sorting algorithm in Lucene, together the two algorithms to improve the original sorting algorithm.On the basis of the above theory, the paper design and implement a car subject search engine based on personalized services, to begin with the needs of users, the system is divided into four modules:car theme crawler module, extraction of car web pages module, indexing module and users' query module, and introduced the principles and design methods of the four modules detailed.Finally, the article carried on the test experiment separately to the system and the correlation theories research work. Through the comparative analysis to the identical inquiry word's inquiry result, obtained the conclusion that the system's search eficiency and search results are better than commom general search engine and subject search engine. Then the test compared the crawler's crawling efficiency before improvement with after improvement, comfirming the theme crawler's crawl efficiency has a quite distinct enhancement after improvement.finally test the relationship between the Max number of thread and crawl efficiency.
Keywords/Search Tags:personalized service, subject search engine, Heritrix, Lucene
PDF Full Text Request
Related items