Font Size: a A A

The Research Of Clustering Algorithm Based On Cumulative Average Density

Posted on:2013-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:B L HuFull Text:PDF
GTID:2248330395985128Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the well-developing ofdatabase technology, data mining began to enter people’s vision. Springing up in the1990s, database mining has achieved a set of systematic theories after continuousresearch and improvement in more than twenty years. Simultaneously it also hasdeveloped a number of relatively mature data mining tools and gained a lot ofsuccessful experience on application cases in different areas.As a main processing method and an important research topic of data mining,clustering analysis has been well-known to many enterprises and research institutions.Especially along with the population of internet, the way of our life has changed a lot.It leads us to rely on internet frequently when using e-mail, twitter and the thirdgeneration mobile communication technology to share and exchange information. Ourdaily life and behavior has generated large amounts of data, and normal workingdepends on part of these data too. Clustering analysis has provided us a convenient,safe and reliable tool, which can help us to carry on information retrieval, fraudshielding and objective forecast. Clustering based on density is an important methodof cluster analysis, many scholars have do study on it and has put forward thecorresponding clustering algorithm, Density-based Spatial Clustering of Applicationwith Noise is one of the classic algorithm. This paper did a detail discussion onanalysis technology, analyzed its advantages and disadvantages in theory andapplication, introduced the concept of cumulative average density, proposed animproved clustering scheme based on DBSCAN, and lastly carried on experiment andapplication test to verify its correctness and practical significance.In this paper, I did a deep and meticulous research on clustering method basedon density; the main contents could be summarized into several aspects as follows:(1) Getting a comprehensive understanding of dada mining through retrievingand read literature of science and technology in the following aspects: main concepts,basic principle, processing steps, techniques and methods which are commonly used,recent research status and application conditions, doing a study on clustering analysistechniques deeply and exhaustively, introducing the clustering method emphatically,analyzing and summarizing the respective effect and performance.(2) Analyzing the thought and insufficiency of DBSCAN on the basis of theory of data mining and clustering analysis, putting forward an improved algorithm basedon cumulative density to solve the existing defects of DBSCAN, that is inputsensitivity and unable to distinguish clusters which have different density and areadjacent to one another, using accepting factor to provide reference principle formerging clusters in order to improve clustering effect.(3)Applying clustering algorithm based on cumulative density to the webpagemain text extraction, proposing a new extraction model based on density clustering,achieving webpage text extraction throught webpage pretreatment, data conversion,cluster analysis and other steps, programming application and making experimentswith several kinds of webpages to verify the validity of the model.The research results show that, in comparison with DBSCAN, clusteringalgorithm based on cumulative density has reduced input parameter sensitivity tosome extent, and can achieve desired effect when clustering datasets with clusters ofvariable density linked together.
Keywords/Search Tags:data mining, clustering algorithm, cumulative average density, clusters linked together, webpage information extraction
PDF Full Text Request
Related items