Font Size: a A A

The Research On The Integrated Knowledge Mining System For The Pollution In The Agro-Product Area

Posted on:2010-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q ZhengFull Text:PDF
GTID:1118360302495023Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of China economy, the serious pollution on the agro-product area has caught so many eyeballs of the public. Aiming at improving utilization efficiency of the pollution data , we have a deep research on the pollution analysis and evaluation and as a result this paper come out.In detail, our research has built up an integrated knowledge mining system for the pollution in the agro-product area, in which a knowledge mining process can be applied to the pollution monitoring results and some corresponding spatial data of the agro-product area. This system would consist of four parts: data clearing system, non-spatial predicate mining system, spatial predicate extracting system, and spatial&non-spatial association rules mining system.The technologies adopted to clear the data of the agro-product polluted area can be described as attributes clearing and duplicated data clearing. Statistics, clustering, pattern-based and association rules have been discussed and one of which was selected as the optimum method for the attributed clearing. A new technology was developed to clear the duplicated records, which can be described as: the DBSCAN clustering method was adopted to extract the similar duplicated records, and then ant colony algorithm was run to merge and delete the duplicated records.Non-spatial predicates extracting can be split into two sub-tasks, one of which is non-spatial background knowledge extracting, the other is the atomic proposition sets extracting. Firstly, the non-spatial background knowledge was extracted in the form of a Cartesain Product which would be built as relation(tuple, attribute) after relational analysis. When we extract the atomic proposition set, the prediction estimation of the pollution in the agro-product area would be performed at the same time. Namely, the Principal Component Analysis was applied to reduce the dimensions of the pollution data. The RBF neural network was adopted to get the prediction estimation. And then the Similar Weight Method was used to extract the rules to form the atomic proposition set.In this paper, a new technology of extracting spatial predicates was delivered. Firstly, the concept of spatial objects hierarchy was introduced, and then we use the rough sets technology on the base of the 9-intersection model to build a new rough 9-intersection matrix, on which the CART decision tree was adopted to extract the spatial predicates. A refined spatial predicates space was obtained after the merging operation to the spatial rules was performed under the limitation of the constrain bias.In order to mining the spatial association rules, the SPADA algorithm was introduced in this paper. The spatial observations was built up on the bases of non-spatial predicates space and the spatial predicates space. The intra-level and the inter-level search were implementing according to theθ-subsumption in the structure of the predicates space hierarchies. the spatial association rules set would be presented in the end. At the same time, the pattern/rules constraint biases were applied to improve the searching and pruning speed in the mechanism.In the last section of the paper we describe a practical example that shows how it is possible to perform a spatial analysis on the pollution data in Daye of Hubei province. We exam all the algorithms we mentioned at above and the results show that this integrated knowledge mining system works well and the mining results are satisfied.
Keywords/Search Tags:Agro-product Area, Data Mining, Predicate Extracting, 9-intersection Model, SPADA, RBF, SWM, Association Rule
PDF Full Text Request
Related items