Font Size: a A A

Empirical Analysis Of K-means Algorithm For Internet Industry Stocks

Posted on:2021-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L ShiFull Text:PDF
GTID:2480306107980019Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
High profit means high risk,and it is very important for investors to study which Internet industry stock to invest in.The ultimate goal of this article is to effectively cluster the Internet industry stocks and provide a basis for investors' investment decisions.Based on this,this article obtains data on the main influencing factors of stocks in the Internet industry and adopts similar methods and principles to Zhang Wenming's improvement on k-means clustering algorithm when text clustering,so that it can be applied to the analysis of Internet industry stock data,selection and conclusion.Help investors choose the most suitable stocks in the Internet industry.The k-means algorithm is a typical algorithm in the distance-based clustering algorithm.This algorithm has the advantages of simple operation,wide application,and high compressibility and scalability in the processing of large data sets.However,the algorithm still has some limitations.Because the initial clustering center selected by the k-means clustering algorithm is random,this results in the algorithm being extremely unstable and only reaching the local optimum.This paper deeply analyzes and studies a k-means clustering algorithm proposed by Zhang Wenming based on the combination of density and nearest neighbor.Such an improved algorithm effectively improves the stability and efficiency of the algorithm and reduces the algorithm overhead.The basic idea of improvement can be simply summarized as first analyzing the density of the data set,removing noise and outliers according to the threshold given manually,and then using the nearest neighbor to determine the initial of the improved algorithm based on the initial clustering center calculated from the density Clustering center.The two methods were tested with the iris data set,and the clustering effect of the two algorithms was evaluated by the two evaluation indicators of SSE and Sil.The results show that the clustering effect of the comprehensive algorithm is much better than the traditional k-means clustering algorithm.After cleaning,standardizing,and KMO testing the data,the first five public factors were extracted through factor analysis and respectively named as market factors,potential factors,profit factors,evaluation factors,and risk factors.Internet industry stocks can be divided into four categories through comprehensive algorithms,namely: first-class stocks with certain investment value,second-class stocks without investment value,third-class stocks with low investment value,and the fourth category of investment value.
Keywords/Search Tags:Internet industry, stocks, k-means algorithm, empirical analysis
PDF Full Text Request
Related items