Font Size: a A A

Research On Hierarchical Clustering Algorithm And Its Application Based On Estimation Theory

Posted on:2017-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D K ZhuFull Text:PDF
GTID:1368330569498413Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Distance(Metric)based Hierarchical Clustering(HC)algorithms play an important role in many fields since its ever advent,e.g.bioinformatics,information theory and computer science.As a family member of clustering,HC methods are able to do cluster analysis,and more importantly,they unveil the inner structure of data(clusters)which makes it popular.The clustering results are dendrograms(trees,hierarchies of clusters),providing muti-scale information,especially useful for situations where the trees are physically meaningful.It is often the case in applications that the input distance data carry some level of uncertainty.This uncertainty can be an inherent “noise” in the measurement process or just be a technique for modelling the distance assignment process.Typical distance-based clustering methods take the results of noisy samples as the ground truth,which is obviously not optimal.This dissertation has set up a statistical model parameterized by HC results,analysed the statistical properties of Single Linkage Hierarchical Clustering(SLHC),and proposed likelihood-based estimation methods,followed by applications in Wireless Sensor Networks(WSNs)and radar HRRP angular segmentation.Chapter 1 is the introduction.We have justified the value of this research and summarized the up-to-date achievements in theoretical analysis and applications of HC.The research here is mainly about estimating SLHC,due to the fact that SLHC is the only one satisfying axiomatic scheme,stability and convergence properties.In Chapter 2 we have given a brief preliminary of HC algorithms;set up the likelihood model parameterized by dendrograms(equivalently denoted as ultra-metrics)considering the uncertainty of input distance data;and analyzed the statistical properties of SLHC.In order to obtain the integrated likelihood,we prove the geometry of metric space,as the pre-image of SLHC map,to be a union of polytopes(subspaces)labelled by Minimum Spanning Trees(MSTs).Under some conditions,we have proved SLHC is equivalent to Maximum Partial Profile Likelihood Estimate(MPPLE),indicating that a full MLE will outperform SLHC.Treated as an naive estimator,we have proved SLHC is consistent.In Chapter 3 we have proposed a numerical approximation of the full likelihood for estimating dendrogram structures,since estimating continuous parameters without explicit expressions is generally unfeasible.A Dirichlet distribution based method for sampling height vector a has been proposed.For reducing the search space of dendrogram structures,we have proposed a likelihood based local optimization strategy with minor performance loss.Simulations with small number of points demonstrate that SLHC is significantly outperformed by the MLE estimation method.However,a clear weakness of our current approach is its computational complexity which increases very rapidly with data size.We introduce Pseudo-likelihood approximation for reducing computation complexity,and refer to Monte Carlo Expectation Maximization(MCEM)estimation algorithm in Chapter 4.Pseudo-likelihood is an approximation of full likelihood by multiplying conditional probabilities of each parameter,especially useful when joint probability distributions are unavailable.Our pseudo-likelihood is based on a relaxation of the triangle inequalities,conditional merely on those ‘close' parameters.For sampling a general metric,we propose an incremental method which generate high dimensional random metrics efficiently.We present and analyze simulations on 100 points clustering to demonstrate the validity of our method.Chapter 5 is the research on applications.In WSN design,we propose an energy efficient routing protocol based on estimating SLHC.Our method introduces PEDAP algorithm to the communications between inner cluster sensors in the traditional protocol based on HC methods.Considering noise,our estimation of SLHC improves the energy efficiency as demonstrated by simulations regarding to energy consumption per round and life time of the network.In radar target recognition based on HRRPs,we propose a frame segmentation method based on estimating SLHC.Given the full angular HRRPs,we cluster them by estimating SLHC and take the average HRRP in the same cluster as its template for target recognition.Simulation validates the better performance of estimation method than average segmentation and non-estimation clustering method.We summarise the main achievements and perspectives of this research as the conclusion in Chapter 6.
Keywords/Search Tags:Hierarchical Clustering, Single Linkage Hierarchical Clustering, Dendrogram, Ultra-metric, Minimum Spanning Tree, Monte Carlo, Expectation Maximization, Wireless Sensor Network, HRRP
PDF Full Text Request
Related items