| The Internet contains a large number of information about the supply and demand of agricultural products,the dynamic state of market,the agricultural policies and regulations and so on,which are scattered in many agricultural sites in a heterogeneous manner.In the face of this agricultural information without unified formal representation,it becomes extremely difficult for users to obtain them accurately and timely.In addition,governments at all levels and agricultural research institutions and organizations have invested more resources to establish websites,information databases and expert decision-making systems on agricultural technology,aquatic products,animal husbandry and other fields.These information platforms can only serve professionals with a certain level of knowledge and agricultural technology.However,the information construction in rural areas is relatively backward,and most of the agricultural users lack the ability to analyze,describe and acquire information needs.In view of the above problems,this paper,combining the development of agricultural informationization and the characteristics of agricultural information in China,analyzing the core principles and classical algorithms of the general search engine system,starting with the three main factors affecting the retrieval results,i.e.content relevance of pages,hyperlink analysis and user query behavior,optimizes and improves the relevant algorithms and establishes a set of model and technical method which apply to the vertical search engine joint ranking.This model and technical method of search engine joint ranking provide strong technical support for Henan Science and Technology Department’s key scientific and technological project "Research on Key Technologies of Agricultural Information Recommendation Based on Vertical Search Engine".The main research contents and achievements are as follows:(1)A method of constructing retrieval model based on content relevance is proposed.Since those pages of agricultural websites usually contain a large number of advertisements and pictures about agricultural and sideline products and other worthless information,the content in each area of those pages varies in importance.In addition,there are a lot of obscure words in agricultural terms,and the problems of "zero probability" and "data sparsity" will appear in the probability calculation of the estimation factor.In order to solve the above problems,based on the traditional probabilistic retrieval model,this paper proposes a method for calculating the correlation degree which should be assigned to different weights in different "domains" of pages.The theme pages are divided into different content blocks according to their functions by Doc View model to extract feature elements,and then the key words,the word frequency and other factors are taken into account,the data region is segmented,and the feature weights of different regions are calculated synthetically.The regression smoothing strategy based on mutual information is introduced to the statistical language model.The main idea is to reduce the binary pair probability with low mutual information value to compensate the zero probability events.(2)An optimization method of PageRank algorithm based on non-suspended virtual node reclassification is proposed.Most of the websites and search results pages visited by agricultural users are relatively fixed and basically related to the type of agricultural products they manage or grow.As a result,agricultural types of web pages will create a lot of mutually pointed links to facilitate user browsing,and these links will form a dense "block structure" after a long period of time.Based on this characteristic,a method of classifying web nodes according to their position and characteristics in the link structure diagram is proposed.According to the difference between the inbound link and the outbound link of node,the page node is usually divided into two types: the hanging virtual node(which has inbound link but not outbound link)and the non-hanging virtual node(which has both inbound and outbound links).Based on this,the paper divides the page nodes in more detail.There are three kinds of nodes: hanging virtual nodes,common nodes and ordinary nodes.A simpler matrix is obtained by permutation of the partitioned linked matrix.Then,the large matrix with more dimensions is decomposed into several sub matrices,and the parallel computation is used in the iterative process.When there is a block structure in the network link graph,the more common nodes there are,the faster the algorithm can improve the speed of web page vector sorting.(3)A method of constructing retrieval recommendation model based on improved query click graph is proposed.In order to improve the bias problem of traditional query click bipartite graph,this paper introduces the improved query click graph recommendation model in which the click frequency is used instead of click counts.By establishing the formal description and optimization goal of the relationship of elements in the bipartite graph,the paper improves the weight of the agricultural users’ search intention in the recommended results,and reduces the influence of irrelevant information in the agricultural websites on the results.Moreover,by using the theory of transfer probability,this paper reconstructs all the weights and makes the weight of all sides of bipartite graph become integer,which is convenient to solve the optimization algorithm and solve the problem of "recommendation topic drift" which is easy to appear in the traditional random walk recommendation model.Then,the stationary distribution of Markov chain in stochastic process is used to converge the transition probability matrix,and the accuracy of the algorithm is improved by setting the appropriate iteration times and the random walk range in the self-transition probability control chart.(4)A method for constructing a joint sorting recommendation model of Markov chains is proposed.The use of a single sorting factor in the traditional retrieval recommendation model leads to an unreasonable sorting basis for agricultural web pages,and the final sorting results of the retrieval system cannot really reflect the characteristics of agricultural web pages and the click-through characteristics of agriculture-related users.Therefore,this paper proposes a method of establishing a supervisory learning framework with strong extensibility,which takes the Markov chain as the core of the algorithm,combines the sorting results of the correlation between the query words and the content of the page,the hyperlink analysis and the user’s query click behavior,transforms the joint problem of ranking results into a semi-positive definite programming problem,regains weight coefficients for each basic sorting through the method of supervised learning,and deduces the detailed process of solving the problem. |