| The "fourteenth five year plan" emphasizes the important strategic role of Technological Development in promoting the high-quality development of society.As an effective carrier of technology,patents record the current dynamics of Technological Development in China.The analysis of technological evolution can reveal the changes in key research fields at different stages,better grasp the trend of technological change,and provide reference for the government,enterprises and individuals’ technological research and development strategies.At present,technology evolution analysis mainly relies on external structured data such as IPC number and reference relationship,neglecting the deep-seated technical information contained in unstructured patent data and unable to deeply explore the knowledge flow of unstructured patent data.Therefore,this study focuses on unstructured patent data to carry out technology evolution analysis and research,and excavates technology information from the patent text to achieve efficient and accurate technology evolution analysis.This study includes the following three main contents:First,aiming at the problems of high-dimensional sparsity of patent unstructured text data,the word distribution state is used to measure the strength of word to text representation in the feature extraction of unstructured data,and then the distributed text representation mode is obtained by weighting,and the validity of the algorithm is verified by using the corpus of Tsinghua University to achieve efficient feature extraction of text data.Second,the initial centroid of K-means clustering algorithm determines the convergence speed and clustering effect.According to the label free characteristics of unstructured data,this study designs an improved k-means clustering algorithm based on variance decision tree.The optimal partition attribute is selected by the variance maximization strategy.When the maximum number of leaf nodes is k,the splitting is terminated,and the average value of leaf nodes is calculated as the best centroid.The effectiveness of the algorithm is verified by the UCI standard data set.Thirdly,according to the obtained patent clusters,the technology clusters are extracted,and the technology content and time sequence are respectively obtained through the word cooccurrence relationship and the introduction of the time axis to construct the time keyword the second mock examination matrix.The core technology,the edge technology and the time sequence changes of the technology are distinguished,and finally the visualization of the technology evolution is realized.This research takes patent text data as the research object in technology evolution analysis,and proposes an improved word vector feature extraction algorithm based on word distribution state to achieve efficient text representation;An improved k-means clustering algorithm based on variance decision tree is proposed to improve the clustering effect and convergence speed;The technology cluster is extracted through patent cluster clustering,and the technology evolution path is obtained through fine-grained research on the technology association relationship within the technology cluster from the content and time sequence. |