Font Size: a A A

Research On Agricultural Knowledge Recommendation Model And Features Extracting Algorithm Of Vector

Posted on:2012-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2218330371950414Subject:Agricultural information technology
Abstract/Summary:PDF Full Text Request
During the 12th Five-Year Plan, agricultural information was intended to be the top priority of national economy and social development. Information technology used in agriculture has become a basic trend. Information service is a major way of promoting the agriculture-related science and technology for many developed countries. However, as the service subject, farmers, how to get valuable and interesting information from magnanimous data is one of hotspots in the course of agricultural information.Recommendation technology is introduced in the Hunan Agriculture Information Services Platform to build the agriculture recommendation model which based on the content-based recommendation technology. Farmer interest model and document feature model are built to provide personalized recommendation information to farmers.Different from traditional recommendation model,this platform considers different interests of users and self-adapting modification problems. So farmers' requirement can be satisfied better. Meanwhile, to optimize the traditional feature extraction algorithm, we analyze the shortages of traditional feature extraction method, taking account of the influences of features distribution in different table space and HTML tags. We use supporting-word to equalize the problem of skewness which caused from differences of dialect. The main content of this paper is stated as follows.(1)Study the research progress of present recommendation methods, and analyze merits and demerits of each method.(2)Present the whole agriculture recommendation frame for Hunan Agriculture Information Services Platform.(3)In the user interesting model and document feature model, the traditional weight algorithms, TF-IDF and TF-IDF-IG, are analyzed, and their shortages are found out. First, TF-IDF which based on features' frequency of occurrence in documents, didn't consider feature distribution in documents. For imbalance problems on the number of documents, which are caused by differences of order of magnitude in each industry category, we have to eliminate this differences in the process of weight calculating. Second, To a certain degree, TF-IDF-IG optimized the TF-IDF. TF-IDF-IG just considered feature distribution in total document sets,which didn't think about the influences of feature distribution in each table space based on different industries and HTML tags in weighting calculating. Third, this platform based on the agricultural data which has strong area differences.So we have to consider the influence of dialect in weighting calculation. In this paper, we use supporting-word equalization to solve the problem of skewness which caused from differences of dialect.(4)This paper improves the traditional features extract algorithms in terms of feature distribution in table spaces based on different industries and html tags. We use a categorizer to categorize user interests, which can reduce the amount of calculation on similarity comparison between user interest model and document feature model.(5)The optimized algorithm is compared the evaluation criteria, Precision,Recall and F1 with TF-IDF and TF-IDF-IG.First,as the test data, mass agricultural information was gathered by the spider program which was developed by our project group. Second, we choose 4 groups of user randomly, as test user groups, of which the number N is 25,50,100, and 200, respectively. The Precision,Recall and F1 of the recommendation model were measured, which shows the improved method is available. The experiment explains that when the number of user is increased, the Precision,Recall and F1 of the recommendation model are increased too. That is to say, the accuracy of the system is convergent, as the amount of calculation increases, which explains the presented model has better expansibility.
Keywords/Search Tags:recommendation system, features extraction, TF-IDF, TF-IDF-IG, optimized algorithm
PDF Full Text Request
Related items