Font Size: a A A

Research On Hierarchical Clustering Algorithm For Block Data

Posted on:2015-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WeiFull Text:PDF
GTID:2308330461983935Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid evolution of data storage technology and the appearance of’era of big data’, the human race is collecting dramatically more data than ever before. However, along with the harvest of data, way to analyze such an amount of data is becoming a headache for analysts; it is not hard to understand what a tough work it is to extract useful information from the data based on the restriction of human brain and body. Actually, data analysis by handwork always suffers from inaccuracy, lost dimension, time-consuming and so on. DM (Data mining) technology with computation equipment provides people a way to solve this problem. With the help of computers and DM tools, analysts now have a strong support force to get detailed and precise look into the huge data sets; or, to be more accurately, the underlying knowledge of the data. Knowledge from data includes distribution and trend of the data, as well as some interesting linkage contained, such as the famous conclusion,’both of sales were improved when put beer and paper diapers on the same shelf. The knowledge from DM tech could provide a detailed analysis for the analysts and strong backing for decision-makers.The technology of data mining provides a solution for the analysis of complex data. As an important part of DM, data clustering technique has gained remarkable advances since it was first put forward in 1955. Recently, when facing huge data sets, two ways are always taken:parallel computing or use sampled data; the former has the drawback of demand of high-level computation ability, while the later suffers from loss of information the data expresses. In order to solve the problem, this paper made the following work:First, this paper proposes a new data analysis structure:Data Block. This structure can fully reflect the characteristics of a to like.Second, The definition and mathematical meaning of according to the Data Block, defines the calculation formula of the distance between the Data Block.Third, based on the block of data of the definition of distance, we design a new hierarchical clustering algorithm, the experimental results verify the effectiveness of the algorithm.The research work in this paper will further enrich the data scope of clustering algorithm, data mining and machine learning has an important guiding significance.
Keywords/Search Tags:Data Block, Hierarchical Clustering, Clustering Algorithm
PDF Full Text Request
Related items