Font Size: a A A

Research On Cluster Analysis Of Biomedical Patent Data In Yunnan Province Based On Spark Cloud Computing Architecture

Posted on:2019-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2438330563458049Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the times technology,the number of patents has increased dramatically.Patent information,as the most effective carrier of technical information,conceals a lot of technical intelligence information,and patent text is the best source of technical intelligence information.As the key province of bio pharmaceutical industry,the collection and application of patent data in Yunnan is relatively backward,which can not provide decision support for the planning and deployment of the industry.Traditional patent data mining has many problems,such as low efficiency,single dimension,small sample data and deep level,so that it can't satisfy the demand of patent data mining nowadays.Based on this,this paper uses cloud computing technology and data mining technology to explore the patent data in the biomedicine field.The main research work of this paper is as follows:(1)put forward a multidimensional clustering analysis method of patent data.In this method,4 important evaluation indexes of patent application,patent authorization,patent growth rate and patent efficiency are selected as cluster variables at the same time,and then the annual development of patent,IPC classification number and high yield applicant are analyzed.This method can deeply excavate the correlation between data,better classify the patent data and make the clustering result more integral,so as to make up the shortage of traditional patent data analysis.(2)put forward a method of clustering patent text based on LDA topic model and mining patent technology topics.The LDA theme model represents each patent document as a probability distribution made up of a number of topics,and each topic is represented as a probability distribution of many words.In this way,LDA projects documents and words onto a set of topics,trying to find out the potential relationships between documents and words,documents and documents,words to words through the subject.This method realizes the unsupervised automatic identification and acquisition of the potential technical theme and theme distribution in a large number of Patent Texts,effectively realizes the reduction of the patent information and improves the efficiency of the patent clustering.(3)put forward a patent analysis method based on technology topic mining.According to the results of technology topic mining,the technology theme and time dimension are used to analyze the evolution trend of patented technology theme by calculating the theme strength.Then,we compare the patent technology themes between Yunnan province and domestic biopharmaceutical provinces.This method can directly grasp the general situation of the development trend of the field technology,avoid the hot technology area,and explore its own advantages and breakthroughs.This article mainly aims at the analysis and study of the above methods based on the patent data of biological medicine in Yunnan province.Effective clustering analysis on massive patent data,and through the LDA mining model of patent theme theme,to solve the traditional text clustering cannot be well used in solving the topic analysis technology patent text,combined with the theme of good results in the analysis of Yunnan province biomedical technology direction.
Keywords/Search Tags:Biopharmaceutical, data mining, theme model, LDA, Spark
PDF Full Text Request
Related items