| The big data technology in the financial field has promoted the development of the financial industry,it is widely used.But the problems of multi-source heterogeneity and lack of correlation brought by financial data cannot be ignored.In the increasingly complex global financial system,massive financial data undoubtedly increases the complexity of the financial system.We use available public data to analyze the value behind financial data has become a very valuable exercise.As one of the main economic pillars in our country,small and medium-sized high-tech enterprises have played an important role in promoting the economic development.However,limited by themselves and the market,small and medium-sized high-tech enterprises still have the problems of weak market support and high investment risk,these problems will bring risks to investors.A single enterprise at the core which ceates risk will trigger the butterfly effect and lead to a larger range of risks.This is because of the correlation between small and medium-sized high-tech enterprises,that is,the correlation between small and medium-sized enterprises will lead to the transmission of risks.From the angle of data analysis,we hope to analyze the correlation between enterprises in risk transmission.The Science and Technology Innovation Board mainly serves small and medium-sized enterprises with scientific and technological innovation.This thesis takes the listed companies on the Science and Innovation Board as study object.We select the daily operating data that has been disclosed on the Science and Innovation Board,and analyze the correlation in risk transmission between small and medium-sized high-tech enterprises based on stock price fluctuations.In this dissertation,three index data sets of stock logarithmic rate of return,trading volume and turnover rate were selected for correlation analysis.Firstly,pretreated the acquired data set.Secondly,used Euclidean Distance to construct a matrix.Euclidean Distance was used to calculate the similarity by the spatial distance between two points.It is more intuitive.Hierarchical Clustering is used to classify the data set.When the similarity between enterprises is high,they will be grouped into one class.Then,the network structure of the data set is constructed to analyze the correlation between enterprises.The optimal path of the network is extracted based on Kruskal minimum spanning tree algorithm.After that,analyzed the network centrality of small and medium-sized high-tech enterprises on the Science and Technology Innovation Board.Finally,the correlated indicator is established to consider the importance of nodes in the whole network structure.After considering the degree of influence of nodes on other surrounding nodes and the importance of connecting other nodes,the results show that the influence degree of the enterprises in the entire network can be comprehensively reflected through the correlation index.The greater the value of the correlation degree,the greater the influence on other companies. |