Font Size: a A A

Search Engine Theory And Technology Research

Posted on:2017-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhangFull Text:PDF
GTID:2308330491450787Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Search engine has become an indispensable part of the Internet World in the booming network today. Lots of related technologies have got the effective development and problems been solved. In the domestic industry, companies have mastered most of the basic technologies of search engine and process professional developers, but it is still a challenge for the capacity of processing on account of more and more valuable information raised. How to snatch at the information we need accurately and rapidly from the Internet is still a problem that we must to face in de development process. Therefore, researching related technologies of search engine becomes the urgent affairs to the development of the Internet.The thesis starts with the theory, process and architecture with detailed analysis and research, and then discuss the future direction of the search engine, which will be more intelligent、personalized and characterization. In order to have a deeply knowledge on pivotal technology of search engine, we launched a detailed study and put into practice one by one. Combined with Heritrix which is an open source crawler toolkit to archive web crawler program, and improved the efficiency of crawler algorithm. This paper implements Chinese Word Segmentation, Web page repetition remove, index creation and search and query results sequence, etc. with Lucene, Solr, IKAnalyzer and other related technology. Finally this paper combined all the modules above in together and form a simple search engine system which has the capacity of full-text retrieval.
Keywords/Search Tags:Search engine, Web crawler, Lucene, Solr, Chinese Word Segmentation
PDF Full Text Request
Related items