Font Size: a A A

Research On Clustering Preprocessing Of Data Resource And Its Application

Posted on:2008-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X XiaFull Text:PDF
GTID:1118360218460565Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
A new wave of technological innovation is allowing us to capture, store, process and display an unprecedented amount of information about our planet and a wide variety of environmental and cultural phenomena. The hard part of taking advantage of this flood of geospatial information will be making sense of it. - turning raw data into understandable information. Today, we often find that we have more information than we know what to do with. Now we have an insatiable hunger for knowledge. Yet a great deal of data remains unused. (The Digital Earth: Understanding Our Planet in the 21st Century[O~re1998], by U. S. Former Vice President A1 Gore, on January 31, 1998.)Without materials, nothing exists. Without energy, nothing happens. Without information, nothing makes sense[~et1965]. As one of three resource(materials, energy and information), information brings more and more important influence on our life. For the wide availability of huge amounts of data and imminent need for turning such data into useful information and knowledge, Knowledge Discover in Database(KDD) and Data Mining(DM) have come into being attracted a great deal of attention.Being the fundamental object of information field, Data Resource can be the cognition and recapitulation of data and its statement on resource. With the effective utilization of KDD and DM, improving the quality on data resource and strengthening the efficiency on data object has naturally become the main target. Preprocessing of data resource is the necessary stage of KDD and DM, as also clustering analysis is the perfect technique on KDD and DM. Therefore, Research on preprocessing of data resource with clustering analysis has the significance on practice and discussion.In the dissertation, some discussion on clustering preprocessing of data resource has carried out and the main research results are as following.Firstly, according to the divisive hierarchical clustering, a method of Database Cluster Preprocessing on Analytic Hierarchy Process(DCP-AHP) is constructed. Standing on the plane, section and space, DCP-AHP emphasizes the hierarchy on the target. With the DCP-AHP, the data object sets with the higher dissimilarity can be ignored, clustering cleaning on the data object sets can be achieved, and the error from qualitative analysis to quantitative analysis can be reduced.Secondly', according to the lowest relativity of the data object, a method of Database Cluster Preprocessing on Principal Component Extraction(DCP-PCE) is submitted to carry out the clustering'extraction of principal component by hierarchical analysis. The projection on the most differentiation of the data object is defined as principal component, which can be proved to include all the original information of the data object sets. By the DCP-PCE, integrality of information and lower dimension of principal component are solved synchronously, dissimilarity.and dimension of the data object sets are decreased, and clustering reduction of the data object sets are reached.Thirdly, making use of the characteristic "0" and "1", which is the physics storage attribute of the data object, an algorithm of Numerical Cluster on Same Entity from Different Sources(NC-SEDS) is put forward to turn all the data object into numerical statement. Not considering other attribute of the data object, the numerical statement will be known as the basis of clustering to improve the clustering state of SEDS. Through the exercise of method, the times of comparison among the data object is played down, the executing time is dropped off and the clustering integration is taken.Fourthly, following out the "complicated problem's solution", a method of Cluster Preprocessing on Ontic Kernel and Histogram(CPOKH) is brought forward to cluster preprocessing of the data object. In the method, the Weak Ontic Kernel(WOK) comes from Object Data Time by the user's demands, and will be combined into Strong Ontic Kernel(SOK). Based on the SOK, the histogram will be made up to analyze and detect the clustering on material ascription of the data object.Fifthly, refer to the distillation of "energy" and "hit", a strategy of Clustering Optimized by Energy Hit(COEH) is taken to make the valid dynamic threshold among the cluster by energy. With the function of COEH, energy driven about the user's demands will be brought into effect in all data-space, and all the data object, including the outlier, is planed as a whole at the unified cognition platform. Therefore, the clustering optimized of the data object can be ensured on the unification and overall environment.Finally, an evaluation system on colleges and universities education is confirmed as the application research in practice. All the work in the dissertation is verified by the real experiments. By leading the clustering analysis into the preprocessing of data resource on the colleges and universities, it is possible to discuss the effect on all fields validly, in particular the gain and loss about the student training.
Keywords/Search Tags:Data Resource, Preprocessing, Clustering Analysis, Data Object, Analytic Hierarchy Process, Principal Component Analysis, Same Entity from Different Sources, Numerical Cluster, Ontology, Histogram, Energy Hit, Clustering Optimized
PDF Full Text Request
Related items