Font Size: a A A

Research And Implementation Of Vertical Search Engine For Network Literature

Posted on:2016-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J QianFull Text:PDF
GTID:2208330464463541Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of internet information, the massive amounts of data continuously, the efficiency of people look for the literature in the network is more and more low. And, with high speed transmission and development of literature on the internet, piracy, infringement and plagiarize becomes more and more serious. Therefore, a more excellent search service is badly needed, in order to improve the search efficiency and protect the network literature. The quality of search engine is of great significance for user, it is also a key measure of success for search engine technology. It is one of the most important question for a search engine sorting algorithm that evaluate the web pages and sort it. Vertical search engine is thus produced, it combine specific professional and search engine in order to provide higher quality and better service for user.In this thesis, the main content of it is based on the development of the network literature, and the core technology of it is based on the development of search engine. Through the research on the background and significance of the topic, and in-depth understanding of the history of search engine, the present situation of the vertical search engine and the future trend of the development of search engine such as knowledge, as the research and implementation of this thesis laid a good theoretical basis.Firstly, the thesis discusses the concept and the working process of vertical search engine, on the basis of in-depth study on the principle of vertical search engine, the thesis makes a detailed introduction about the web spider, the web spider search strategy, information extraction technology, Chinese segmentation technology such as the core technologyies of vertical search engine and the open source frame work Nutch of vertical search engine.Secondly, analysis and research the two classical webpage ranking algorthms in the field of search engine: PageRank algorithm and HITS algorithm. Contact the the current Internet research situation and combining with the research direction of the research, aiming at the phenomenon of the traditional PageRank algorithm, such as the theme of elegant, unreasonable webpage weight distribution, focus on the old webpage etc, combined with the content similarity between webpages, the structure of webpage, the time of webpage, the artical proposed an improved algorithm for webpage ranking. The improved algorithm added the time attenuation factor, reduced the the theme drift phenomenon and improved the accuracy of query.Finally, the thesis combined with the open source search engine framework Nutch and the improved web pape sorting algorithms, designed and implemented the based on the network literature as the theme of vertical search engine prototype system. Through contrast to the major search engines and data test and data simulation verified the feasibility and superiority of the system.
Keywords/Search Tags:Vertical Search Engine, PageRank Algorithm, Nutch, Webpage Ranking Algorithm
PDF Full Text Request
Related items