Font Size: a A A

Search Engine Research And Design

Posted on:2011-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:K M YangFull Text:PDF
GTID:2208330332477006Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Search engine is a practical application of the joint issue of information retrieval techniques and web data mining techniques. It deals with a large amount of information extracted from Internet and Intranet. Because of building an enterprise-level search engine is such a very difficult thing, there are few search engine companies exist.In this thesis, I first introduce the principles of information retrieval and web data mining. Then, I specify and analyze the architecture of the general search engine including crawler, text parsing component, text indexing component and query processing component.In chapter one, I will tell you why I make a choice on research and design of search engine and the structure of this thesis.In chapter two, you may understand the history of Internet, the tendency of its going and what have made it sophisticated. You will be in touch with information retrieval techniques and web data mining. And sooner you would think that an incredible area. You will know what web data mining seems like and the classification of it.The chapter three, which is more detailed than chapters before, will introduce the architecture of a general search engine. The crawler (also spider) collects a large number of web pages and takes them into the processing of text parsing and indexing component which creates indexes for improving performances. In the later segments, it introduce other components of search engine including text processing, query interface.The categories of search engine contain full-text search engine, directory search engine and meta-data search engine. You will know three generations of search engine by now and how the PageRank, Topic-Sensitive PageRank and HITS work. They are all link-analysis algorithms.We will meet many equations and formulae in this chapter and a lot of fundamentals about search engine.The content of the fourth chapter is totally the building process for MySearch, which is the search engine application being built by me. There are lots of codes on behalf of the crucial parts of MySearch. It cost me about four or five mouths in programming with Java (a programming language) for such a nice project.The last paragraph is a summary of work I have done and the future work. In my opinion, the search engine in the future will be more intelligent and more user-friendly. It will understand what we say and what we would expect, and more revolutionarily—it will change our lifestyle.
Keywords/Search Tags:Search Engine, Data Mining, Information Retrieval
PDF Full Text Request
Related items