Font Size: a A A

Research And Design On Key Technologies Of Vertical Search Engine Oriented Soybean Theme

Posted on:2014-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiuFull Text:PDF
GTID:2248330398453640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology, network information resources show atrend of explosive growth, and how can quickly find the information which meet users’ demands isbecoming a more important issue increasingly. Currently, the search engine has become one of themost important applications in the internet. Traditional general search engine provides a unifieduser interface for all users, however, with the sustainable growth in the amount of information, itcannot meet the users in specific areas who have individual needs on the accuracy, real-time andprofundity of information, thus, the vertical search engine which was designed to query aparticular subject area or topic was born at the right moment, and obtained the rapid developmentand wide application.This subject comes from the Spark Plan Program, based on the agriculture reality of majorgrain-producing areas, and aims at the ubiquitous problems in agricultural informatization that theinformation resources have a low sharing degree, especially the soybean industry informationconstruction, provides shared data resources for the users who are engaged in soybean productionand processing, research and circulation work.This paper used vertical search technology to collectand to filter soybean related information in agricultural field on the internet, and then builtsoybean information database for portal website which marked with “China’s soybean network”.At the same time, designed the framework of vertical search engine oriented soybean theme,researched the key technologies, and implemented prototype system. The main research contentsof this paper were as follows.(1) Firstly, maked clear purpose and significance of the research, and analyzed the researchstatus and trends of vertical search engine, as well as its application in the field of agriculture.Secondly, analyzed and compared the development, the structure, the principle and the pros andcons of general search engine and vertical search engine. At last, based on soybean topic, designedthe architecture of soybean theme vertical search engine.(2) The web spider was the core of information collection, it automatically searched andcrawled on the internet according to a certain search strategy, and stored the collected informationin the local. The biggest difference between topic web spider and general web spider was that theformer grabbed topic pages selectively, while the latter is “see pages to catch”. This paperconducted in-depth research and analysis on the structure, search strategies, as well as analysis algorithm of the topic relevance on web spider, considering the impact of the link anchor text andtitle on relevance and links trap, improved the existed link analysis algorithms.(3) Indexes can improve the retrieval efficiency, in this paper, it was able to improve thespeed effectively that the management and audit module loaded data.The object of index was pagedocuments after processing of chinese word segmentation. The chinese word segmentation was theprocess that splited continuous word sequences into words sequences. This paper researched theexisted segmentation algorithms and inverted index technology, as well as the indexing processand the search process of open source Lucene indexing framework. Chinese word segmentation ofLucene was not precise enough, therefore, used Lucene indexing framework based on IKAnalyzersegmentation.(4) Based on the above researches, this paper implemented the prototype of vertical searchengine oriented soybean theme in accordance with the theory of software engineering, mainlyincluded web information collection module, indexing module, management and audit module. Inthe end, provided soybean related information for portal website.In a word, this paper regarded the major domestic soybean sites as initial crawl target sites,such as, China agricultural trading network, China Grain and Oil Information Network,Heilongjiang Province Agriculture information network, World granary network, etc. andimplemented prototype system of soybean theme vertical search engine based on Java technologies,provided data support for the soybean portal website, at the same time, provided theoretical basisfor the query for soybean theme information. This study can also be used as the reference of otheragricultural theme search engine.
Keywords/Search Tags:soybean theme, vertical search engine, web spider, chinese word segmentation, index
PDF Full Text Request
Related items