Font Size: a A A

Research And Implemantation Of The Transformation From Unstructured To Structured Data

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:L P WanFull Text:PDF
GTID:2248330398474593Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today’s society, with the continuous development of basic technology and level of informationization continuously deepened,large amount of digital equipments generate massive data so that data resources increase greatly. In the data, relational database as the main structured data grow slowly, while unstructured data, such as file, email, office document, audio, video and image become the "main force" of data growth. The most direct problem that the phenomenon has led to is the storage and management of massive unstructured data. The key to solve the problem is achieving the conversion from unstructured data to structured data by "unstructured data—semi-structured data—structured data", so as to effectively manage the structured data.On the basis of analysis of the structural characteristics of unstructured data such as Text files, Word documents, Excel documents, the corresponding conversion program to get the contents of the unstructured data and convert into standard XML document respectively by different conversion rules is proposed. Through analyzing the building of the mapping relationship of XML document and relational database by model-driven, semi-structured data in XML is converted into structured data based on the conversion rule in order to support traditional database based on the relational model.The thesis presents a conversion model from unstructured data to structured data and makes the conversion process as a whole by increasing the metadata extraction module, template creation and management module to get and manage the needed file structure in the unstructured file conversion process. Through increasing file format definition module, the system can support the data conversion of the same type and multi-structured file, and achieves the data conversion of large amount of unstructured file.The method and implementation process of conversion of unstructured data to structured data proposed in the thesis has successfully applied in the project of digital simulation platform of high-speed train. In the actual operation of the simulation platform, the professional subsystems generating simulaion output file can convert into the system required oracle database table in order to facilitate the data management of these unstructured documents and ensure the smooth operation of high-speed trains digital simulation platform.
Keywords/Search Tags:Unstructured Data, Structured Data, Conversion, Metadata, XML
PDF Full Text Request
Related items