Font Size: a A A

Research On Some Key Techniques Of XML Data Intelligence Management

Posted on:2009-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:1118360278457313Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergence of massive XML data and its transmission, XML has been the important standard of the information expression and data exchange on the Internet. So requirements for the XML datamanagement have been evolving and presenting an important challenge in the current XML database domain. Problems as how to express effectively, query and mine these XML data have important values in both theory and application aspects.In view of the existent problems and shortages of the XML data management in the present research, this paper has adopted a series of current researches on theories and methods of XML data methods, swarms intelligent principles, pattern recognition, neural networks, data mining and intelligent calculation, and has proposed some renewed intelligent management methods to data cleansing, query, data mining using XML keys based on prototype system XBASE(XML DataBase), simultaneously discusses some efficient methods to the XML refactoring and so on.This dissertation focuses on the following four aspects to solve the intelligent management's problem of querying and mining based XML data:1. XML data management frame's foundation—XPDMThe existing XML data model has four problems which affect the effective management of the XML data, they are:(1)heterogeneous data: The dissimilar individual often has the difference to the identiacal data object's naming and the description, which has caused many difficulties to the multi-dataset integration operation and affected the validity of information query; (2)inconsistent data: Without integrality of data restraint, the disagreement data has affected the accuracy of information query; (3)uncertainty of the data dependent's relations among various data sources:It has tampered with the merging and query operations among data sources; (4)standard code of the semantic: Because XML evolved so that many standards are imperfect especially and there is no unified standard so far esulting the query sentence tedious and confusing.In view of those questions, the paper has proposed an object oriented massive XML data management frame—XPDM(XML-based Probability Data Model) based on vector space model by XML keys and the probability theory. This frame has solved the four problems above well through carrying on the new expansion of semantic standard to XQuery 1.0 and XPath 2.0 data model (XDM) and the XML data vector conversion.2. Intelligent data cleansing and query strategyTo solve the dirty data problem of XML document, this paper has proposed a new intelligent data cleansing algorithm on the method of XML keys combination and XML vector model, and the strategy of Bayes learning method and the MarKov chain probabilistic model to attempt a new XML data cleansing meta-data model, and on the algorithm of similarity XML trees' checking, which can accomplish the XML data cleansing by predefined rule warehouses. Moreover, in view of the multifarious detection and bad flexibility formerly of the XML data cleansing, the paper has considered an optimization algorithm of XML data cleansing through combining the XML key, combining the PSO algorithm, introducing the hidden Markov model information extraction strategy; Simultaneously the introduction of intelligent algorithm to enhance and the validity of the XML data query, so this paper uses the heuristic method, combining with the XML semi-structured feature, integrates the PSO algorithm and the ACO algorithm in the massive XML probability query, and makes the corresponding improvement, enhances the scope of query and the efficiency of restraining.3. Intelligent XML data mining strategyIn view of the massive XML data has already gathered in the Internet, to carry on the effective mining to the massive XML data, this paper has studied in the direction:(1)To enhance the clustering quality of massive XML documents, this paper has proposed a XML document clustering algorithm based on an adaptive PSO with Chaos and a vector matrix iterative self-organizing assistant clustering algorithm of XML document, which bases on the PSO algorithm and the vector space model's matrix iteration;(2)To improve the parallel disposal's capability of massive XML document clustering, this paper proposes a parallel xml documents placement algorithm which bases on the chaos principle and an ant clustering model, through defining the corresponding chaos sufficiency function to weight ant with its neighborhood's similar degree; (3)In allusion to fluidity and infinity of XML data, and the present insufficiency of quality detection by XML data, the paper has proposed an algorithm which construct the XML key's vector matrix as the window, and restructures the XML data using the vector product wavelet transformation multistage decompositions, recombining the least square support vector machines to construct double sliding window to carry on the query and the monitoring of XML data, the method can adapt the request of the XML data's quality management on network tranfer.4. Intelligent XML refactoring strategyFor optimizing the XML semantic consistency and settling the XML structure transformation with consumer dissimilar request went by time, the paper has proposed the research on intelligent XML refactoring. In view of the XML semantic consistency and its path layer, and uniting the vector machine principle and the frequent pattern's characteristic, the XML frequent pattern XFP-tree algorithm has been considered to carry on the strategy of XML structure refactoring based on the document segment refactoring method, which can more ensure XML quality.
Keywords/Search Tags:XML key, ant colony optimization (ACO), particle swarm optimization (PSO), vector matrix, project frequent pattern tree, refactoring
PDF Full Text Request
Related items