Font Size: a A A

Clustering Algorithm Of Position Uncertain Data Based On Connection Number

Posted on:2018-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2348330518476407Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,data mining is always the key focus of the information industry.The main reason is that there are a lot of data,which can be widely used and is urgent to be converted into useful information and knowledge.The information and knowledge accessed can be widely used in a variety of applications,including financial market,business,trade,academic,scientific and research.Clustering is one of the most important topics of research in data mining.With the rapid development of internet and information technology,the data with different structural characteristics continue to emerge,which is a new challenge for the clustering analysis.In many modern applications,for example,in wireless sensor environment monitoring applications,the extremely limited system resources of wireless sensor networks(such as network bandwidth and power supply)can only achieve data in discrete ways to collect.The continuity of natural changes and the discrepancy between the discretization of data sampling determines that the data obtained from the outside world is essentially time-varying uncertainty data.Therefor it is necessary to consider the uncertainty of the data when dealing with the relevant data so that it is possible to get the correct results of handling,which put forward new challenges for the traditional data processing methods.Mathematical tools for handling uncertainty data includes probability density functions,fuzzy numbers,interval numbers and connection number.The connection number is a new mathematical tool for the study of uncertain data problems.It has been widely used in various areas such as water resources system evaluation,multi-attribute and multi-objective assessment,group decision-making and so on.But the application of connection number in the field of data mining clustering is still very rare.The main content and achievements of this paper:1.Firstly,this paper introduces the data mining in large data environment and the clustering of important topics in data mining simply and expounds the background and reason of the research on the uncertainty data which is the key point of this paper.Then the definition of clustering and the similarity measure and common clustering methods are expounded in detail.Besides,the representation of uncertainty data is introduced and the core mathematical tools-the number of connection theory is put forward,which is made a detailed introduction as the bedding for the core content of study in the following sections.Finally,the status in present of uncertainty data research is put forward,which pave the way for the main work of this paper.2.For the clustering of uncertain data based on partition in the present,in order to improve the shortcoming of high computational complexity and the neglect of the influence of uncertainty on the clustering results in the clustering process,this paper proposes a method based on the connected number of uncertain data partition clustering.The algorithm not only greatly reduces the computational complexity,but also considers the overall position of the uncertainty data in the clustering process while also considering the influence of the uncertainty trend on the clustering results.The experimental results show that the proposed algorithm has high performance and clustering accuracy.3.As the disadvantage of lack of uncertain data clustering based on density and the disability of finding any shape cluster and distinguishing the outliers for the clustering method based on partition,this paper proposes a clustering algorithm based on the density of uncertain data with connected number,which reduces the computational complexity and puts forward new distance measurement and takes into account the trend of uncertainty change and greatly reduces the parameter sensitivity of clustering algorithm based on density.The experimental results show that the algorithm achieves high quality clustering with fewer parameters,which is highly operable,practical and efficient.
Keywords/Search Tags:data mining, uncertain data, connected number, density, partition
PDF Full Text Request
Related items