Font Size: a A A

Neighborhood Belonging Information Based Rough Clustering And Incremental Algorithm Research And Its Software Design

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:J Y SunFull Text:PDF
GTID:2518306722958849Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an unsupervised learning method in machine learning,clustering algorithm can effectively divide data objects into different clusters without knowing the label of sample data in advance.It is an important tool to process and analyze data.However,in reality,the data often have overlapping attributes,which makes the division between clusters more fuzzy,and the data in the boundary region of clusters have the same characteristics,which leads to the large errors in the traditional clustering algorithm when dealing with these complex data.Therefore,it is of great significance to study the clustering algorithm which can deal with complex data by combining the theory and method of dealing with uncertain information.On the other hand,when dealing with dynamic incremental data,the existing incremental clustering algorithm ignores the attribution information of its neighborhood data points,and the change of cluster structure does not consider the data distribution between clusters.Therefore,the problem of new data points partition and cluster structure change in incremental clustering algorithm needs to be solved.In addition,in the era of explosive growth of data,combining the algorithm theory and software development to develop a data analysis platform with clustering analysis as the core function,which has more practical significance for the promotion and application of related data analysis methods.This dissertation takes the research of rough clustering algorithm based on neighborhood attribution information?incremental rough clustering algorithm for dynamic data?design and implementation of data processing and analysis platform based on rough granular computing as the research line.Exploring the membership relationship between data objects and clusters deeply,the division of new data points in dynamic data and the method of cluster structure change,and based on the research results of clustering algorithm,a scalable data analysis platform is designed and developed.The main research contents are as follows:(1)Rough K-means Algorithm based on the Mixed Measure of Neighborhood Partition InformationFor data sets with overlapping attributes,the distance between the data in the boundary area of the cluster and the center point of the cluster is small,so it is difficult to distinguish the data points based on distance and density,which leads to the low accuracy of rough k-means(RKM)and its derived algorithms in dealing with such data sets with overlapping attributes.To solve this problem,a rough k-means algorithm based on the mixed measure of neighborhood ownership information is proposed.The algorithm combines the local density and neighborhood ownership information of data objects to measure the similarity between data objects and various clusters.The relationship between boundary data and cluster is determined by the local spatial distribution,which makes the difference between fuzzy uncertain information more obvious.The experimental results show that the algorithm is more suitable for the data division with overlapping attributes.(2)Rough K-means incremental clustering algorithm considering neighborhood belonging information of boundary samplesWith the dynamic growth and change of data,on the basis of the original data clustering results,the key to improve the quality of incremental clustering is how to measure the attribution of new data.The existing incremental clustering algorithms mostly consider the location distribution of new data,and ignore the attribution information of its neighborhood data points.Based on the rough K-means clustering algorithm,a rough K-means incremental clustering algorithm considering the neighborhood belonging information of boundary samples is proposed to deal with the uncertain information of new data points in the boundary region.First of all,the algorithm divides the new data samples in the boundary area,comprehensively considers the belonging information of the data points in the neighborhood,and the similarity measure between the new data points and various clusters is more reasonable;secondly,in the incremental clustering process,according to the changes of the cluster structure caused by the new data points,the algorithm merges or splits the clusters to make the clustering more reasonable.The experimental results verify the effectiveness of the algorithm.(3)Research and development of data processing and analysis platform based on rough granular computingBased on the theoretical research results of the project,and at the same time,the classical rough clustering algorithm is integrated,using the current popular web development framework,a data processing and analysis platform based on rough granular computing is designed and developed.The platform takes clustering analysis as the core,and supports data acquisition,data analysis,visualization and other functions.The new data analysis platform can reduce the threshold of data analysis for ordinary users,and promote the promotion and application of research results.
Keywords/Search Tags:Rough K-means clustering, neighborhood belonging information, incremental clustering, cluster structure, data processing and analysis platform
PDF Full Text Request
Related items