Research And Application Of Improving K-means Algorithms

Posted on:2020-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Yao

Full Text:PDF

GTID:2428330614470685

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The emergence and development of the Internet greatly facilitates the collection of information,and makes the scale of information reach a qualitative leap,which also leads to the accumulation of massive information.How to extract the information that people need from various huge amounts of information and transform it into organized knowledge has become an urgent problem to be solved.In order to deal with this problem,data mining technology has attracted more and more attention in various fields.Data mining contains many contents,among which clustering analysis is the most commonly used and the most important.It has been widely used and has important research significance.There are many cluster analysis methods,among which the most basic and simple one is the partition cluster analysis method.K-means algorithm,one of the most popular clustering algorithms,has obvious limitations in clustering.It can't give a clear initial rough center point,and the clustering effect will be interfered by the unreasonable initial cluster center point,so it will not get the global optimal solution in general.But the HK algorithm,which appeared later,has some advantages,such as accuracy and fast convergence,but its time complexity is unacceptable.In this paper,we use the distributed merging HK means algorithm to get the distributed MHK algorithm,which fully improves the above shortcomings.The specific application is to use its clustering to analyze the data indicators of the information users need,and to cluster the data objects after preprocessing.Combined with the idea of two-way merge,and designed on the Map Reduce architecture,greatly improving the efficiency of the algorithm.On the one hand,it can define the relevant indicators of the information users need.On the other hand,through clustering analysis,indicators can not only carry out comprehensive analysis and compare their data management,but also accurately find out the root cause of the gap between indicators and data.In order to verify the effectiveness of the algorithm,different types of data sets are applied and tested,and ideal experimental results are obtained.In addition,the clustering analysis method has been widely used in bioinformatics.Therefore,this paper studies and presents a clustering algorithm based on weighted feature integration(WFE)for the analysis of protein location sites.The algorithm is based on the yeast protein measurement data with multiple index features,and the prediction of yeast protein location sites is the best potential of many research methods.The WFE process firstassigns different weights to the features,then calculates and presents the results to obtain the best results,in which,according to the principle of removing the highest and lowest scores,the noise is removed to calculate the average score,and the size of the score is measured by histogram,and the index of the maximum average difference is obtained.The experimental results of WFE algorithm and other clustering algorithms based on the idea of weighted features show that our new algorithm is superior to other feature weighted algorithms in accuracy and stability.

Keywords/Search Tags:

Data Mining, Cluster Analysis, Distributed MHK Algorithms

PDF Full Text Request

Related items

1	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
2	New Methods For Cluster Analysis In Distributed Environments
3	Research And Implementation Of Data Mining Algorithms Based On Distributed Computing
4	Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes
5	Based On The Application Of Cluster Analysis Of Water Pollution Monitoring System
6	Study And Application Of CRM Data Mining Based On Clustering Algorithms
7	Algorithms Research And Instance Application Of Cluster Analysis In Customer Relation Management
8	Mining Dynamic Heterogeneous Data With Distributed Algorithms
9	Cluster Analysis In Data Mining And Its Control In Applied Research
10	Web Cluster System Qos Control Mechanism Based On Data Mining