Font Size: a A A

Research On Relevant Issues Of Event Mining In Meteorological Field

Posted on:2014-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X BaiFull Text:PDF
GTID:1108330434471195Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a discipline that aims for the discovery of potentially useful information from data. It is relates to statistics, database systems, machine learning, optimization, and so on. In resent years, with the rapid development of information technology, a lot of raw data are collected. Thus data mining and its applications in all kinds of interdisciplinary are becoming more and more important in research and industry. Among them, the spatial data and time series data are common types of data that widely exists in geographic information systems, sensor networks, stock market, meteorology, and so on. The study of basic algorithm for spatial data and time series data, and the applications in many other disciplines has always been popular topics in data mining.Meteorology is an emerging field for data mining applications, and meteorological events mining is one of the key research directions. Mining meteorological events provides solid data supports for decision makings in people’s daily lifes, thus is very important and necessary. So in this paper, we mainly focus on event mining in meteorology, and study common meteorological data such as time series, spatial data, spatial-temporal data, etc. Specifically, this paper works on basic algorithm research such as cluster analysis in spatial data and time series symbolic representation, and applications in event mining in meteorology. The main contributions of this paper are as follows:(1) A novel clustering algorithm for spatial data is proposed. Inspired by the dynamic process of aggregations between particles under potential field, a novel dynamic clustering algorithm Yupc based on Yukawa potential is proposed. Yupc neither relies on any assumption of data distribution, nor prescribes any specific number of clusters. Natural clusters of different shapes, densities, sizes, numbers and distributions can be detected by Yupc, reflecting the intrinsic structure of the original data set. In addition, a framework is proposed to automatically find appropriate parameters for Yupc. Experiments performed on synthetic and real-world data show that this approach outperforms existing algorithms, especially in data sets with arbitrary kinds of clusters.(2) A time series symbolic representation method rSAX is proposed. The symbolic representation of time series is a popular representation technique that can reduce dimensions of time series while still preserving their fundamental features for further analisys. As an effective representation technique, Symbolic Aggregate Approximation (SAX) has been widely used in time series analysis. However, SAX always maps time series data into symbols by definite breakpoints. As a result, the similar points close to the breakpoints cannot be well represented, and thus lead to poor Tightness of Lower Bounds (TLB). To fill this crucial void, a time series representation method, named Random Shifting based SAX (rSAX) is developed. Specifically, the key idea of rSAX is to generate soft borders by random shifting rather than hard borders. Therefore, the points close to each other will have higher probabilities to be mapped into the same symbols, and significantly improve the TLB of representations without increasing the corresponding granularity of representations. In addition, this paper also theoretically proves that rSAX can achieve better mapping performances and TLB than SAX. Finally, extensive experiments on several real-world data sets clearly validate the effectiveness and efficiency of the rSAX approach.(3)A co-anomaly event mining framework for meteorogy is proposed. Temperature series in meteorology is a kind of time series. History multiple temperatrure series recorded years of trends and details of temperature changes, and other important events information. Co-anomaly event is one of the most significant climate phenomena characterized by the co-occurrent similar abnormal patterns appearing in different temperature series. Indeed, these co-anomaly events play an important role in understanding the abnormal behaviors and natural disasters in climate research. However, to the best of our knowledge the problem of automatically detecting co-anomaly events in climate is still under-addressed due to the unique characteristics of temperature series data. To that end, a novel framework Sevent for automatic detection of co-anomaly climate events in multiple temperature series is proposed. Specifically, it is proposed to first map the original temperature series to symbolic representations. Then, the co-anomaly patterns are detected by statistical tests and finally generate the co-anomaly events that span different sub-dimensions and subsequences of multiple temperature series. The detection framework is evaluated on a real-world data set which contains rich temperature series. The experimental results clearly demonstrate the effectiveness of Sevent.(4) A high-temperaure event mining algorithm for meteorogy is proposed based on spatial-temporal clustering. There are different kinds of scenarios and requirements in climate event mining. High-temperature event is one of the important events. Mining space-time regions of high-temperaure events can help climate experts identify the coverage of time and space of high-temperature events, and further analysis the causes and evolutions of high-temperature events. Mining space-time regions is also an important task in data mining. It has wide applications in various disciplines, such as epidemiology, meteorology, etc. The existing space-time regions events mining algorithms usually based on cluster analysis, which is difficult to detect irregularly shaped events when they evolve by time. Meanwhile the parameter setting is also a difficult problem for most existing methods. In order to detect the exact start-end timestamps and coverage of event regions, a novel automatic event mining algorithm Gtem is proposed. Combined with Minimum Length Description principle, Gtem can optimize parameter settings; detect events regions of different evolutions according to the spatial-temporal correlations of objects; determine the start-end timestamps and irregularly shaped coverage, and find outliers as well. Gtem is applied successfully to find high-temperature space-time regions in the real-world daily weather recording dataset.
Keywords/Search Tags:Climate Data Mining, Event Mining, Time Series Analysis, Cluster Analysis, Spatial-Temporal Clustering
PDF Full Text Request
Related items