Font Size: a A A

Study Of Data Warehouse And Data Mining Application On Geohazard In Three Gorges Reservior Area

Posted on:2011-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:C H ZhuFull Text:PDF
GTID:1118360308475259Subject:Earth Exploration and Information Technology
Abstract/Summary:PDF Full Text Request
The abrupt geological hazards, such as collapse, landslides and mud-rock flows, and their risk assessment has become one of the major issues of common concern. Landslide is one of the main types of geological hazards, the degree of its risks and impact only next to earthquakes and volcanoes, has the characters of widely distribution, highly frequency, high-speed movement and seriously losses and so on. The study of landslide forecasting can improve the capability of rapid reaction to the abrupt geological hazards for effective prevention and mitigation of hazards, which has great significance.The Three Gorges Reservoir Area is the one of the hardest hit of landslide in China.The coastal of Three Gorges reservoir has the complex geological and geomorphological conditions, and located in the sub-tropical climate zone with rich rainfall and heavy rainstorms that causes the occurrences of collapses, landslides and debris flows, and distribution of many ancient landslides. Migration project of the new towns are almost located in the slopes areas, which is influenced by the water level change of reservoir and the impact of immigration work, and not only many ancient landslides will be reactivated, but also lead to occurrence of a new landslide. Many immigrant construction projects have been threaten by the landslide or collapse, and new site of some counties had to move times due to the influence of landslides.With the progress of the Three Gorges Project, the State Council starts paying great attentions on the prevention of geohazards in the reservoir area. In July 2001,the State Council launched a comprehensive project of prevention and control of geological hazards in Three Gorges Reservoir Area.With the progress of the safeguard and control information-based construction projects in Three Gorges reservoir area, affluent geology hazard data have been accumulated until now. Because of the urgency and jeopardize, the commanded objects in geology hazard alarm command system will be classified in hazard types, alarming level and reorganized according to grade in the same alarming level, therefore, it not only need analysis, statistic, high-level algorithm library and model library which use the data, but also need a tool which has the ability of digging out useful information from tremendous amount of database, thus, making the decision process correctly and quickly.The requirements mentioned above can't be meet according to custom operation database, but, using the data warehouse OLAP (On-Line Analytical Processing) and data-mining technology, we can found the interior connection, dig out useful rule and knowledge and provide service for decision support system by analyzing and utilizing the vast data automatically and effectively.The purpose of this study is to use data warehouse technology to effectively integrate the data of geological hazards in Three Gorges reservoir area, and apply the data mining techniques to mine useful information from historical data for landslide prediction and server for early warning command system.The papers mainly studied on two aspects include the data warehouse construction of geological hazards and data mining application of landslide prediction:(1) Data-driven method was adopted for the data warehouse construction of geological hazards, and the data warehouse schema is obtained by analyzing the underlying source systems.The main idea of the data-driven method is:the construction of data warehouse is based on the existence source systems and making full use of existing data and code, rather than starting from scratch. The design of the data warehouse is proceed from the existing database system, and in accordance with the requirements of business field to re-examine the link between the data in order to organize data warehouse theme.The whole design has four stages includes requirements specification, conceptual model design, logic design and physical model design models. In the stage of requirements specification, source databases that is the existence operational database, should be identify, which include property database and spatial database of geohazard. The theme of the geohazard data warehouse of Three Gorge Reservoir Area has been confirmed in accordance with the analysis of source systems, which include the theme of regional geohazard forecasting, local of migration geohazard forecasting, single geohazard forecasting, water wave forecasting, safeguard engineering assessment, monitoring forecasting and warning support decision and emergency command and so on. According to the practical situations of data collection progress and what I have completed of the research work, this paper focused on two themes include the theme of regional geological disaster forecasting and landslide monitoring and forecasting.In the stage of concept design, analyzing the hierarchy of the data of regional geohazard forecasting theme and landslide monitoring forecasting theme,from which derived landslide susceptibility fact and landslide displacement monitoring fact and determined the fact measures, dimensions and hierarchies, then the landslide susceptibility cube and landslide displacement monitoring cube were established. The fact of landslide susceptibility has measures are existent landslides, engineering geological rock group, the slope structure, conformation, slope angle, elevation, surface rivers, vegetation, land cover, roads, aspect and surface curvature and so on. Dimensions are landslide type, scale and region and theirs corresponding hierarchies are "types—>kind—>model—>style—>stage—>character", "scale—>size" and "county—>province—>the reservoir area" respectively. The fact of Landslide Displacement Monitoring has measures are deformation displacement, rainfall, temperature, water level change, earthquake, rainstorms and human activities and so on. Dimensions are landslide type, time, monitoring location and monitoring type and their corresponding hierarchies are "character—>stage—>style—>type—>kind","date—>month—>quarter—>year", "monitoring location—>andslide mass—>village—>town—>county" and "monitoring content—>monitoring instrument—>monitoring method—>monitoring type" respectively.In the stage of logic design, concept multidimensional model build above has been transferred to the logical model and ETL process of landslide susceptibility multidimensional model and landslide displacement monitoring multidimensional model have also been designed,of which the landslide susceptibility multidimensional model included two parts of spatial data ETL and property data ETL. In the stage of physical design, the load process has been implemented from source to the target data warehouse by using Oracle Warehouse Builder (OWB),and then geohazard data warehouse was built based on Oracle database. Moreover, the performance of data warehouse has been optimized from the aspects of partition, index, materialized view and storage structure design(2) Based on the cube of geohazard data warehouse, the process of data mining of landslide forecasting was implemented by using the support vector machine regression algorithm of Oracle Data Mining (ODM) embedded in Oracle Database. As the next-generation algorithm, Support Vector Machine is based on statistical models rather than the loose analysis of natural learning systems can obtain the best predictions in theory. It can also solve the small sample, nonlinear high dimension and local minimum points of practical problems better; hence, it is regarded as a better alternative to neural network algorithm. The support vector machine regression algorithm of ODM has the characteristics of using conveniently, deploy easily and intervene in the parameters of algorithms rarely.①Firstly, Zhong County is the study area of landslide susceptibility zone. Landslide susceptibility analysis is through the spatial distribution statistical relationship between existence landslides and the causative factors that can evaluate the likelihood of occurrence of the potential landslides within a particular region, which is conducive to land development and planning, as well as reducing the threat of landslide hazard. This study use widely recognized raster GIS model, based on susceptible landslide cube of data warehouse dimensional modeling and analyze the sensitivity of the research area using ODM's support vector machine regression algorithm. In order to test the performance of support vector machine algorithm of Oracle Data Mining, two kinds of commonly used quantitative statistical models, weights-of-evidence and logistic regression, are used for comparison. By using the same sample and forecast variables as support vector machine mode, established weights-of-evidence and logistic regression model. The prediction results indicate that although do not predict the total existence landslide,the support vector machine forecast 88.02% of the existence landslides of the high susceptibility and very high susceptibility areas, while the proportion of weights-of-evidence and logistic regression is 84.48% and 58.94% respectively. It show that the prediction capability of support vector machine model is better than weights-of-evidence and the logistic regression model.②Secondly, Baishuihe landslide monitoring data was illustrated for analyzing the time series of landslide displacement monitoring data. Time series analysis has the ability to predict the trend of complex system, which is the hot topic of dynamic forecasting of landslide displacement. The framework of time series analysis based on the data warehouse multidimensional modeling has been introduced for directing to the shortcoming of the flat file that most prediction model currently used. The time series data of Baishuihe displacement preprocess by referencing the theory of State Space Reconstruction, then using the ODM's PL/SQL API to establish support vector machine regression model and carrying out the data mining process based on data warehouse.Multi-step prediction results show that the error rate of the prediction value of support vector machine regression algorithm was controlled within 8% for the first five-step, which indicate the performance is quite good. The error rate of sixth step is greater, which maybe affected by the combination engineering condition that precipitation of 4,5 month amount to 355mm and water level dropped 4.68m in May, so the landslide was in phase of sliding mutation (there is about 100,000 m3 of the soil collapse in central body of baishuihe in June 30,2007.), therefore the data was with no guide, but still meet the engineering requirements with 84.1% accuracy. Thus, ODM's support vector machine regression algorithm can be used for short-term prediction of landslide monitoring.Through the study and research, the main innovation and features of this thesis are:(1) Based on concept of data warehouse, through the in-depth analysis of the fact of landslide susceptibility, designing and building the cube of landslide susceptibility on the basis of raster GIS model of spatial data, implementing the integration of spatial data classified by theme, which is from three different view of scale, region and landslide type, meet the data need of landslide spatial prediction with rapidly response.(2) In-depth analysis of landslide displacement monitoring time-series data, considering four dimensions of time, observation location, monitoring type and landslides type, design and build the cube of Landslide Displacement Monitoring. And then using the ODM's PL/SQL API to establish support vector machine regression model and implementing the data mining process based on the data warehouse.Thesis also has some inadequacies mainly include:(1) the construction of data warehouse is based on the understanding of the business field and the data preprocessing. The preprocessing of spatial data that characterized with consecutive attributes, the reclassify of the causative factors of landslide, which was adapted the method that combined expert knowledge and bivariate statistical methods, which has a certain degree of subjectivity. (2)Designing and building the cube of landslide Displacement Monitoring and mining the time series data of landslide Displacement Monitoring, but not implement the cross prediction of displacement and reservoir lever, displacement and precipitation, the reliability and accuracy of the application model need validate in the further work.(3) Data Mining and GIS mapping functions were not integrated. The results of data mining need to output, and then generate landslide susceptibility map by the GIS software.In summary, integrating the geohazard data into data warehouse and applying the data mining tools based on the data warehouse is applicable new approach for the landslide forecasting.
Keywords/Search Tags:Three Gorges Reservoir Area, Geohazard, Data Warehouse, Landslide Forecasting, Data Mining
PDF Full Text Request
Related items