Font Size: a A A

Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis

Posted on:2016-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2308330470451818Subject:Statistics
Abstract/Summary:PDF Full Text Request
The increasing development of science and technology and the continuous expansion ofindustry database, big data in recent years has been gradually into people’s horizons. Big datais complex and it is the collection of multidimensional variables, sample diversecharacteristics. People research and study the big data with larger enthusiastic just because ofits presentation of complicate and innovation of information and knowledge. A key point inthe research of big data is the continuous development and perfection of data miningtechnology. In the final analysis, the role of big data is the knowledge that it offers, rather thanthe big data itself. The main tasks of big data mining contain description and prediction. Asclustering model has both of the described and predicted functions in data mining, it played animportant role in the pre-processing step and class division step. Therefore the writer took thedata mining clustering model as the main content of this article, elaborated ideas and methodsof data mining procedures to the reader.The panel data possess the characteristic of continuity in time dimension. As for this, thepaper put forward an improved clustering method which based on the traditional k-meansways of clustering. The innovation of the method comprising: define a new similarity indexbetween objects, which could take both the time and spatial dimensions into consideration toconstitute the overall similarity; split the clustering procedure in accordance with the timedimension of the samples and give the attribution circumstances of each period; according tothe membership principle, calculate the weights that object belongs to a class which couldreflect the likelihood that an object belongs to a class. The innovation is designed to avoid thedegradation defects of the time dimension in the past clustering methods, and thus couldobtain a much better and objective result. Comparing with the traditional clustering methodwhich only take the special development trend of the samples into consideration, theimproved method take consideration to both the time and special development trend,so it ismore suitable for the panel data.Chapter1briefly introduces the knowledge of big data and data mining and the researchsignificance of panel data clustering pattern. In the second chapter briefly describes the knowledge of multiple clustering analyses for the reader’s, such as the idea principle andmethod of multivariate cluster analysis steps. In chapter3of this paper, the emphasis is theimproved multivariate clustering model on the panel data. Finally, the method is applied to thelisted company’s shares empirical data analysis, and compared with the traditional clusteringmethods with various aspects. After verification, the results obtained by the improved methodare superior to the traditional method.
Keywords/Search Tags:Big data and data mining, Clustering model, K-means clustering algorithm, Stock
PDF Full Text Request
Related items