Font Size: a A A

Research And Implementation Of Data Collection And Processing System For Big Data Precision Investment Promotion

Posted on:2022-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:M L DaiFull Text:PDF
GTID:2518306728980709Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As an effective means of "Turn the way and adjust the structure to upgrading",the investment attraction is an important driving force for regional economic development.With the society's entering into the era of big data with information explosion,the governmental investment attraction has faced unprecedented challenges.The traditional way of attracting investment manually screens the information matching with the demand of attracting investment from a large amount of enterprises causes many problems such as low efficiency,asymmetric information,imperfect resources,etc.And it is difficult to tap the value of resources suitable for the development of regional industries,resulting in investment mistakes.The acquisition and collation of traditional investment data can be integrated into a complete,unified,high efficiency and systematic process through big data,which provides scientific reference data for precise investment attraction so as to tap and produce more value for government to attract investment.The governmental investment attraction is based on the cutting-edge information of the industry,basic corporate information,corporate news announcements,and corporate patented technologies as the main decision-making benchmark to help investment personnel clarify the industry positioning,grasp corporate information and future investment directions.The thesis takes the above-mentioned investment data as the research object,focusing on data acquisition and processing methods.According to the different static and dynamic characteristics of investment data,two modes of static data acquisition based on incremental crawlers and dynamic data acquisition based on query subject crawlers are designed.In terms of web page information processing,the mainstream general-purpose web page text-extraction algorithm based on the line block distribution function is poor in extracting news text with short length and discontinuous paragraphs.The source code reconstruction of the web page and the text line block density are proposed.With improved methods such as statistical weighting,a weighted line block distribution function web page text extraction algorithm is designed,which retains the advantages of the original algorithm and further improves the accuracy of web page text extraction.Aiming at the problems of collected news text data clustering and topic extraction,a text representation methods fused with LDA and Bert-whitening are designed,combined with Single-Pass dynamic topic center clustering algorithm,which effectively improves the accuracy of text clustering and the accuracy of the topic extraction of big data therefore be enhanced.Based on the Python language and the Py Qt toolkit,a data acquisition and processing system for precise investment of big data is designed and developed,which is mainly composed of four parts: data acquisition,data processing,data storage,and data application.Among them,the data acquisition part mainly realizes the regular incremental collection of basic corporate data and the real-time collection of target companies and industry media data for user queries.In the data processing part,information extraction,data cleaning,and topic extraction are performed on the collected web page data.The data application part provides query functions for investment personnel,transforms the investment conditions under the investment policy into searching conditions such as industry,company nature,registered capital,etc.,helps quickly locate target companies,query and display frontier industry information,target company dynamic information and provide important data support for business personnel to dig deeper.The experimental results show that the system designed in the thesis can effectively complete the data acquisition and processing related to investment attraction,and help franchisers to quickly lock the target company,grasp the corporate public opinion situation and the frontier trend of the industry,the investment risk therefore be reduced.
Keywords/Search Tags:Big data, Investment attraction, Web crawler, Data processing, Web page text extraction
PDF Full Text Request
Related items