Font Size: a A A

Research On Jiangxi New Generation Information Technology Patent Data Analysis Based On Spark

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:K DaiFull Text:PDF
GTID:2428330602478131Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The new generation of information technology industry was formally established in 2009.As one of the strategic emerging industries,it has been the key object supported by the state for decades.In recent years,the new generation of information technology industry in Jiangxi Province has a significant growth momentum,but there are still many gaps compared with the developed coastal economic provinces.Based on big data technology and authoritative patent data,this paper analyzes and forecasts the development status of the new generation of information technology patents in Jiangxi Province,looking for the short board of development,in order to provide more effective countermeasures for industrial development.The main contents of this paper are as follows:1.Collect the data needed for the experiment and build a spark development environment.According to the classification of the new generation information technology industry given by the National Bureau of statistics and the industrial division catalogue given by the National Intellectual Property Office,this paper develops the patent data retrieval mode.Using Python to realize the network crawler,crawling the data from Baiteng,and generating the original data set after cleaning.In this paper,spark cluster and development environment are established,big data framework is used for data statistics and analysis,and echarts chart library is used to realize data visualization.2.An improved k-means algorithm is proposed to cluster patent data.Before clustering analysis,in order to improve the accuracy of experimental results,lof algorithm is used to detect and remove discrete points from experimental data.In order to avoid the problem of local optimal solution,the way of selecting cluster center is improved.In the experiment,the data of Jiangxi Province is taken as an example,and indexes such as patent applicant and patent year data are selected for multi-dimensional cluster analysis.3.Based on the logistic model and life cycle theory,the technology development prediction method is proposed.Taking the patent data as the sample data,the gradient descent method is used to adjust the parameters of the model to get the optimal results.This experiment analyzes and forecasts the development status of the new generation of information technology and some of its core areas,and provides guidance for the future development direction.In order to display the analysis results systematically,a web system is built to visualize the research results.The innovation of this paper is as follows:1.By using the authoritative patent data and big data technology,this paper analyzes and forecasts the development trend of the new generation information technology industry in Jiangxi Province,and fills in the blank of the research in this field in Jiangxi Province;2.The spark big data processing framework is introduced to improve the efficiency of massive data processing,and the k-means algorithm is improved.The discrete point detection algorithm is used to remove the noise,and the selection method of clustering center point is improved,so that the convergence speed of the algorithm is faster and the clustering effect is better.
Keywords/Search Tags:New generation information technology, patent data mining, k-means, logistic model, life cycle theory
PDF Full Text Request
Related items