Font Size: a A A

Design And Implementation Of The Theme Crawler For Procurement Clues In The Automotive Field

Posted on:2020-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:S T JingFull Text:PDF
GTID:2428330575979895Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,the network has become a global information base.In today's era of industrialization and informationization,the network data in the automotive field is growing rapidly,and the information about vehicle procurement can fully indicate the specific needs of automobiles in the market.The discovery and research of automobile procurement leads play an important role in automobile sales,research and development.There are a large number of information resources related to procurement data in the automotive field in the Internet,but a large amount of resources are wasted because information acquisition is complicated and cannot be shared.How to directionalally crawl the data of automobile procurement clues and integrate the information resources has become amain research direction at present.The government procurement website is an important source for obtaining automobile procurement data.The theme crawler technology is the main method for obtaining network data information,and it is also the main research content of this paper.The function of the theme crawler is to crawl the data resources related to the theme on the web according to the preset keywords and the initial network url address.In order to obtain the information of procurement clues in the automotive field,this paper designs and implementsa theme crawler system for the procurement of clues in the automotive field.The main work is as follows:Firstly,the paperanalyses the web page structure of the provincial and municipal government procurement websites,crawls the web page information related to vehicle procurement and car rental in the announcement of the websites,and obtains the links between the webpages and stores them in the database.Secondly,the PageRank algorithm is improved to make it more suitable for the discovery of procurement clues in the automotive field.The traditional PageRank algorithm only considers the link-in and link-out relationship between web pages,and does not consider the topic relevance.There will be a “theme drift”phenomenon,and the algorithm does not take into account the issue of the webpage publishing time and causes “emphasis on the old webpage”.In view of the above shortcomings,this paper combines the traditional PageRank algorithm with the theme of vehicle procurement,and proposes an APC-PageRank algorithm for procurement clues in the automotive field.The algorithm calculates the weight value of the text by judging the relevance between the text and the topic of automobile purchase..The weight value vector is iteratively calculated as a parameter of the APC-PageRank algorithm.In addition,the text appears differently in the web document and has different importance.For example,the title is more important than the body content.Therefore,different positions of the occurrence of the term are given different weights as a parameter of the APC-PageRank algorithm.Because the procurement network announcement will show the release time,the time feedback factor is added to make some compensation for the newly released webpage,so that the new webpage can be raised to a certain extent.The PR value is finally obtained,and the page rank is based on the score,so that the sorted result is more in line with the theme.Finally,the theme crawler system designed for the procurement of clues in the automotive field is implemented..The experimental results show that the improved algorithm has a good effect on the theme representation of the automotive procurement clues in the automotive field.The webpage with clear theme and high link degree can obtain higher ranking,and the accuracy of the topic of web page sorting has improved.
Keywords/Search Tags:Theme crawler, PageRank algorithm, Page sorting, Topic relevance
PDF Full Text Request
Related items