Font Size: a A A

Search Engines Based On Themes Related Fields

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:L C SunFull Text:PDF
GTID:2208360305968073Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the quantity of the data of many websites and intranets is increasing sharply. It was found that the timeliness and accuracy of the search result of these general search engines when they search certain type of information are not very well. Although at the moment Users can employ Google and Baidu, these outstanding general search engines can't solve this problem, either. On the one hand, some information which flow in the intranet can not be open to Baidu and google; the other hand, general search engine page updated more slowly so the effectiveness and accuracy of information can not be guaranteed.Therefore, in order to raise the efficiency of information retrieval of Websites and Intranets, this paper studies and developes a small Topic Search Engine.The Topic Search Engine, oriented to the search of certain themes, is a typical professional search engine.Hoping that through the study can people develop a search engine system based on related fileds and relevant information. And this paper focus on the study of the search algorithms and segmentation techniques of information in order to further understand and grasp the search engine technology, search algorithm and the Chinese word segmentation.The paper aims to:study network algorithms and principles of spiders crawling, Study the methods of improving Focused-Crawling by analyzeing Hyperlinks on the basis of analyzeing and summarizing keyword-based algorithms and concept-based algorithms. Through the experiment, this paper compared the two results (one is the result obtained before the improvement, and the other is the experimental result), expounded on the feasibility and operability of its implementation and laid a good foundation for targeted information collection; developed a Word Segmentation Algorithm the longest being the most fine-grained algorithm for word device, " improved the accuracy of the word segmentation and it own a high-speed processing capability, to meet the uses of small search engines, and tested and analyzed the system test.
Keywords/Search Tags:Topic Search, Small search engine, lucene
PDF Full Text Request
Related items