Font Size: a A A

Research On Web Information Extraction Middleware For Supply And Demand Of Agricultural Products

Posted on:2013-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:2268330398974145Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the explosive growth of web information, web has become an important information resource which is used as web information for supply and demand of agricultural products for farmers. Extracting and reasonably storing the information that web supplies has significant influence on further analyzing the situation and characters of market with supply and demand of agricultural products, then making the plan for supply and demand of agricultural products. Farmers feel very difficult when they search for valid information from the internet, because the authenticity and effectiveness of information and other issues can not be guaranteed. In order to provide farmers with real-time and reliable information of supply and demand of agricultural products, this paper is about Research on Web Information Extraction Middleware for Supply and Demand of Agricultural Products on the basis of studying extant Web information extraction technology, combining the features of web information format for supply and demand of agricultural products from the internet, then integrating the advantages of multiple information extraction technologies, and extracting web information for supply and demand of agricultural products. The main contributions of this paper are as following:First, in order to make the agricultural products information authentic, reliable and real-time, this paper proposes a method of web-page screening that combines PageRank with regular expression. This method can simplify the workload of the following processing. Meanwhile, in order to remove the noise of the filtered web pages, this article proposes a new denoising method that combines the advantages of visual analysis and the simple denoising method to ensure the extraction of information goes in a comprehensive and efficient way.Second, this paper presents a new method of web information extraction based on syntax, semantics and pattern matching, which narrows the scope of the valid information extraction and improves the accuracy of information extraction.Third, a web information extraction middleware is designed and developed. With this middleware, users can extract the web information for supply and demand of agricultural products from the web. The extraction results can be expressed in XML which has a strict structure. This paper brings up a web page pre-processing and an information extraction method that can solve the problems we have encountered in web information extraction for supply and demand of agricultural products, and this method is proved to be practicable by the applicative example.
Keywords/Search Tags:Web information extraction, Web pages cleaning, web information forsupply and demand of agricultural products, Middleware
PDF Full Text Request
Related items