Font Size: a A A

The Research On Clustering Algorithm Based On Marine Environments

Posted on:2015-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ChenFull Text:PDF
GTID:1108330479475861Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of important methods of Data Mining, and is an important branch of unsupervised classification in pattern recognition, which has been widely applied in image processing, environmental forecast, and weather report and so on, and is also a new research method in marine hurricanes and red tide.Aiming at the imperfect and single research methods of marine hurricane and red tide, this paper takes clustering method to analyze these two phenomena. Traditional method researches hurricane according to meteorology and aerodynamics, this paper uses clustering method to abstract hurricane as a large amount of trajectories according to features of forming hurricane. The process of red tide comprises several phrases, and each phrase is uncertainty, which is caused by different roles of physical and chemical factors. The traditional research process of red tide deals with all physical and chemical factors, which easily ignore the effect of single factor on red tide. Aiming at this weak point and fuzzy feature of several phases in red tide, a new algorithm combining fuzzy clustering and weight value is present, which is used to analyze the process of red tide and features of each phase. Furthermore, physical and chemical factors that cause red tide are high-dimension, and clustering is more suitable for analyzing these high-dimensional data than other methods. In order to prove validity of clustering algorithm, this paper designs a new validity index to overcome defects of traditional validity index which based on Euclidean Distance. Research emphasis of this paper is the proposed clustering innovation theory, and application in marine environments is taken as background, and the main content involves computational intelligence, hurricane research and red tide research, which belong to interdisciplinary research, and which is of important theoretical significance and practical application value. The main work of this paper is as follows:1. Hurricane trajectory clustering algorithm is studied. A new clustering algorithm which based on similar sub-trajectory is present. New algorithm proposes the concept of similar sub-trajectory, and takes similar sub-trajectory to approximately represent spatial feature of one sub-trajectory clustering region, which can reduce the search space. New algorithm effectively decreases time complexity and space complexity, and greatly improves the execution efficiency of the algorithm2. Inputting parameters of clustering algorithm are discussed. Because trajectory clustering algorithm is sensitive to inputting parameters ε and Min Lns, a new algorithm which is insensitive to inputting parameters is proposed. New algorithm computes and gets a parameterized cluster ordering by reachable distance and designated distance of partitioned sub-trajectory. The cluster ordering represents inner structure of trajectory data. The process of clustering by this cluster ordering is equivalent to use a parameter range to realize clustering results. Therefore, new algorithm avoids uncertainty clustering results which caused by the sole parameter, and greatly decreased the sensitivity of clustering results on inputting parameters.3. Fuzzy clustering of all phases is studied during the process of red tide. The process of red tide contains four phases, which are influence by many physical and chemical factors and any two phases have fuzziness and are difficult to distinguish. For example, one physical and chemical factor may have the same value or the value changes a lot in four phases. Therefore, different factors have different effects on each phase. Aiming at the fuzziness of four phases of red tide and different roles of factors on different phases, this paper presents a new fuzzy clustering algorithm that based on weight factor, and new algorithm assigns different weight value to membership function and typicality function. New algorithm overcomes the weak points of restraint typical value in fuzzy possibilitic c-means(FPCM) and resolves the problem of unreasonable parameters in possibilitic fuzzy c-means(PFCM). New algorithm determines parameter by prototype learning method, and acquired parameter by this method is more reasonable. Experiments prove this method is effective.4. Fuzzy clustering on interval data set is studied. With the need of applications, more and more interval data are used to describe object. Clustering algorithm which based on Euclidean distance has obvious effect on point dataset; however, that is weak to cope with interval data set. Aiming at this defect, this paper presents three measurements with quadratic distance on interval data set, and objective function is improved on three measurements, and so three algorithms are acquired. Experiments on UCI and marine fish data set show that new algorithms get good clustering results, and have low rate of resubstitution errors.5. Cluster validity index is studied. How to determine clustering number and real clustering center and to measure compactness of intra-cluster and separation of inter-cluster belongs to validation research. At present most of validity index is based on Euclidean distance, which can find the optimal clustering number for spherical data set, howoever, clustering results are not obvious for high overlap and irregular data set. This paper puts forward a non-Euclidean distance validity index named VI. New validity index uses membership to compute compactness intra-cluster, and uses anti-closeness degree to measure separation inter-cluster. New index algorithm overcomes inaccuracy that based on Euclidean distance for overlap data set and irregular data set. New index takes account of not only fuzzy partition but also distribution of data set, and more objective and reliable comparing with other indices. Experiments on data set show that VI can find optimal clustering number and center.
Keywords/Search Tags:trajectory clustering, Fuzzy C-means, red tide, quadratic distance, validity index
PDF Full Text Request
Related items