Font Size: a A A

Research On Incremental Clustering Algorithm Based On Feature Selection

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiFull Text:PDF
GTID:2428330548494975Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,because of the rapid development of Internet and information technology,it is becoming more and more easy for people to obtain a large amount of data.Therefore,how to change these data into useful information that can guide people's life and work is becoming more and more important.However,today's data are characterized by high dimensions,increments,and a small amount of prior label information.Although the traditional clustering algorithm has been used as an important means to extract potential information in data,it has been inefficient and inaccurate in dealing with these characteristics,and can not effectively utilize the prior information of data.Therefore,how to mine data with high dimension and incremental data and effectively utilize prior information in data mining process is particularly important.In this thesis,a new incremental clustering algorithm is proposed to solve the above problems.The algorithm is composed of two parts: the problem of high dimension of data can be solved by the feature selection algorithm and the problem of incremental data clustering can be solved by the incremental clustering algorithm.In the feature selection algorithm,the weight of the Relief algorithm is changed first,so that the changed weight can be used to evaluate the feature subset.Secondly,the QJMI evaluation function is proposed by using the quadratic Renyi information entropy and the idea of mutual information,which can be used to distinguish the correlation and redundancy between the features of the characteristic subsets and have a lower computing load.Finally,by combining the changed Relief weight and the QJMI evaluation function,a FSIRQ feature selection algorithm based on the evaluation criteria of complex correlation degree is proposed.The representative feature subset can be selected by this algorithm and it has the characteristic of fast computing speed.In terms of incremental clustering algorithm,the representative sample points are selected to replace the clustering results.And the clustering problem of the incremental data is solved by mixing the representative sample points and the incremental data.Finally,by combining the FSIRQ algorithm with the incremental clustering algorithm,an incremental FS-RDRS-IC clustering algorithm is proposed.The incremental clustering problem of high-dimensional incremental data can be solved by this algorithm very well,and the prior knowledge of data can be reasonable used by this algorithm,so that the algorithm has ideal efficiency and accuracy.In the experiment,the data set in the UCI database is selected.First,the accuracy and efficiency of the FSIRQ algorithm is proved by comparing the existing feature selection algorithms.Then by comparing with the traditional clustering algorithm,the results show the advantages of FS-RDRS-IC incremental clustering in computing speed and calculation accuracy.
Keywords/Search Tags:incremental clustering, feature selection, mutual information, sample selection
PDF Full Text Request
Related items