Font Size: a A A

The Research Of Clustering Algorithms Based On Data Mass And Potential Entropy

Posted on:2017-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:D K WangFull Text:PDF
GTID:1318330485462085Subject:Management science and engineering, service science
Abstract/Summary:PDF Full Text Request
With the developing of computer science, the human society has already came in the age of big data. The technology of data science is the key point for developing the big data resource. With the data science, people can find out the value in the data, and can take the initiative in big data. As one of the technology of data science, data mining has extensive application foreground in the age of big data. Data mining can dig out the hidden value in data and make full use of data resource. Data mining can solve the problem of huge data with little knowledge to a certain extent. There are three analysis modes in data mining, such as classification, clustering and association. Classification and association are supervised learning algorithm, and clustering is unsupervised learning algorithm. In the age of big data, it is emphasized mining and learning with the full datasets, and it is difficult to structure appropriate training datasets. So, as one of the unsupervised learning algorithm, clustering is suited the background of big data.In this article, some new ideals have been proposed for improving clustering. Such as the theory of vector data field, the new concept of data mass, data mass clustering algorithm and clustering by fast search and find of density peaks with potential entropy. Facial expression recognition and automatic face clustering have been used to test these new theory and methods.Firstly, data field is a model of analyzing data. Classical theory of data field uses potential energy to describe the data distribution. Based on this classical theory, vector data field has been proposed in this article. Vector data field can not only describe data distribution but also describe the movement trend of data. And Hamilton operator has been used to unify the model of data field and vector data field.Secondly, data in the data field should have mass because the objects in physical field have mass. A new concept of data mass has been proposed in this article, that the data mass represents the inherent qualities of data. The inherent qualities that the data mass represents is changing with different mining points. The essence of data mass is weight of data in a particular mining point. For qualities which are not changing with different mining points, this article has proposed the fundamental matrix of data field to represent these invariant properties. Through fundamental matrix of data field, a linear system of equations has been made between data mass and data potential energy. Based on the fundamental matrix of data field, a new method for calculating the best data mass is proposed, which is called internal convex point. This new method can solve the problem of choosing the initial point. Combing the linear system of equations and idea of learning machine, another new method of calculating the best data mass is proposed, which can improve computational efficiency.Based on the data mass, a new algorithm, which is called clustering with data mass, is proposed in this article. In this algorithm, data mass represents the density degree of data. Clustering with data mass can find clustering centers truly, and can complete the clustering process by just one-pass, and do not need to input the number of clusters before clustering. In order to improve the "Clustering by fast search and find of density peaks", clustering by fast search and find of density peaks with potential entropy is proposed in this article. This new algorithm establishes the function relationship between potential entropy and threshold value, and can calculate the best threshold value for every dataset.In this article, facial expression recognition and automatic face clustering are used to testing the new concept and new algorithms. Testing results prove these new concept and algorithms, which are based on data field, performance good in particle.
Keywords/Search Tags:Vector data field, Data mass, Clustering with data mass, Potential entropy, Automatic face clustering
PDF Full Text Request
Related items