Font Size: a A A

Research And Implementation Of Injection Molding Information Based On Web Crawler

Posted on:2020-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2428330596495003Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Over the years,with the development of the Internet,especially the addition of smart phones and various IoT devices,the amount of data in the Internet has also experienced explosive growth.Such a huge amount of information enriches and facilitates people's lives on the one hand,but on the other hand increases the difficulty of obtaining effective information.Therefore,using the topical crawler to collect relevant information of the injection molding industry conveniently and quickly,so as to realize the monitoring and forecasting of the development of the injection molding industry,is of great significance to the development of the enterprise.This article is from the "large injection molding intelligent manufacturing factory",designed and implemented a topical crawler system capable of targeting network data.By reading a large amount of domestic and foreign literatures,combined with the problems encountered in the actual project,I have a certain understanding of the topical crawler and technical framework,and found some problems in the research of the topical crawler: 1)Currently there is no Studies on how to select appropriate initial seeds;2)There is still room for improvement in the performance and recall rate of the subject reptiles and furth er research is needed.In response to several problems raised above,through further practical research,this paper gives some new solutions,and designs and implements the topical crawler system based on this.After the paper,several experiments are used to show the effect of the improved algorithm.The innovations of this paper are as follows:(1)After introducing the problem of initial seed selection,based on the HITS algorithm,a new improvement is proposed to make it easier and more efficient to sel ect the initial seed.In this paper,combined with the authority and centrality defined by the HITS algorithm,they are used to describe the connection between links,and define a formula that can calculate the candidate seed quality,so as to select a better initial seed and improve the theme crawler's efficiency.At the end of the paper,the system acquisition results are also given,which proves the improvement effect of the algorithm.(2)The theme crawler usually adopts the concept background map as th e crawling strategy.In view of the shortcomings of this strategy,this paper presents an improved method--the crawling strategy based on the concept background of comprehensive value.An improved method is proposed for the construction process of the conceptual background image.At the same time,factors such as the parent web page and link context that are often overlooked are included in the comprehensive consideration,and a formula that predicts the value of the link to be accessed is defined to predict the link value in advance,eliminate irrelevant links,and speed up the crawler operation..At the end,relevant experimental data is given,indicating that the theme crawler with the improved crawling strategy has greatly improved both speed and accuracy.(3)Combine the two points to design and implement a complete topical crawler system.This paper introduces the design and implementation of key modules in the system,and designs the corresponding database scheme.The crawler system is implemented by using the crawler framework WebMagic in Java.The crawler system has certain versatility,and the improvement of the initial seed selection strategy reduces a large amount of labor time,and the improvement of the crawling strategy improves the speed and accuracy of the system.At the end,giving the results of the system operation also shows a significant increase in crawling efficiency.
Keywords/Search Tags:Topical crawler, Crawling strategy, Java
PDF Full Text Request
Related items