Font Size: a A A

Study On Hierarchical Clustering Method Based On Shortest Path Strategy

Posted on:2019-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:F J TianFull Text:PDF
GTID:2348330569488313Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is a basic operation of data analysis.The clustering algorithm can be divided into two categories:partitioning clustering and hierarchical clustering according to different clustering processes and results.The hierarchical clustering algorithm contains more information than the partitioning clustering because of the use of a tree structure,and does not require any user-supplied parameters.It's indicated the hierarchical clustering algorithm has certain advantages.However,in practical applications,since the hierarchical clustering process is determinate,the completed processing steps can't be revoked.If some merger or splitting operation in the implementation of the algorithm isn't appropriate,it will result in bad clustering results and even affect the accuracy of clustering.In recent years,there has been a new way to solve problems in the computer industry:First some researchers transform the goal problem into the shortest path problem,then solve the goal problem by solved the shortest path problem.Therefore,this thesis draws on the idea of solving the shortest path problem,and conducts in-depth research on hierarchical clustering algorithms.Firstly,we analyze the advantages and disadvantages of hierarchical clustering algorithms by comparing hierarchical clustering algorithms with divided clustering algorithms and study existing path search algorithms and analyze their advantages and disadvantages.Secondly,in order to solve the problem that the hierarchical clustering can't be traced back,the hierarchical clustering algorithm based on shortest path strategy(SPC)is proposed.The basic idea is to first transform the hierarchical clustering problem into a shortest path problem,then to solve the shortest path problem by the search strategy of A-Star(A~*)algorithm,and then to achieve the solution to the hierarchical clustering problem.Through experimental analysis,the SPC algorithm has improved the efficiency and accuracy compared with the DNA Parsimony Program algorithm(DNAPARS),which proves that SPC algorithm has advantages.Thirdly,since the SPC algorithm has high complexity when the data set is extremely large,the shortest path hierarchical clustering algorithm based on Compute Unified Device Architecture(CUDA)accelerated(cudaSPC)is proposed.Its main task is to used GPU hardware parallel to extract-and-extend points.The experiment results showed that compared with the SPC algorithm,the cudaSPC algorithm has the same accuracy and improves the execution efficiency.Finally,this thesis is summarized and the future work is expected.
Keywords/Search Tags:Hierarchical clustering, Shortest path, Maximum parsimony, A~*algorithm, GPU acceleration
PDF Full Text Request
Related items