Font Size: a A A

The Research And Design On Intelligent Vertical Search Engine

Posted on:2011-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:S G HuangFull Text:PDF
GTID:2178360308959080Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Along with the rapid development of Internet, the resources of web are on the increasingly expanding. Facing the mass of information resources, more and more people are now concerning how to access to resources better and faster. General search engine results pages contain a lot of "noise" pages, people need to choose what he needs. Vertical search engines provide people with a faster, more professional, more accurate search services of network resources.Vertical search engine is used to collect information resources of Internet that meet specific topics. It is able to provide more professional search services. The thesis designs a vertical search engine prototype system, including the focused crawling model, the index model and the retrieval model. The main work is listed as follow:①The thesis presents an improved focused crawling model, which can solves the"topic dirft"problem, including a subject knowledge based on feedback, a topics identification model and a link analysis model. Through getting continuous feedback from the theme words, subject knowledge can have a certain adaptive capacity; considering the various weight of html's tags, the thesis presents a improved vsm algorithm to determine the topic similarity of page; Through parsing the HTML document as a DOM tree structure, the thesis proposes a link context model to determine the topic similarity of URL correctly.②The thesis studys the principles of full-text search and the structure of inverted index in depth. On this basis, the thesis presents a hybrid index model based on subject knowledge to improve the efficiency and accuracy of Index. Then, the thesis designs the workflow of search baseed on the hybrid index and analyzes the sort model of search results combining the vector space model.③Finally, the thesis realizes a hardware-oriented vertical search engine prototype system based on the framework of Nutch. Experiments show that, the vertical search engine system has more precise rate and certain self-adaptive properties, solves the"topic drift"problem, and reachs the research's purpose basically, also provides a theoretical and experimental basis for the follow-up study.
Keywords/Search Tags:Vertical Search Engine, Focused crawling, Vector Space Model, Hybrid Index, Nutch
PDF Full Text Request
Related items