Font Size: a A A

Research And Application Of Method For Optimization Extracting Data From Large Amount Of Data

Posted on:2009-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:L M ZhangFull Text:PDF
GTID:2178360242472658Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Data extraction, transformation and loading (ETL) is the core techniques to realize high quality data in data warehouse. Study, design and realization of related ETL functions with facing subjects can obtain relating data which is integrated and accumulated and helping to make decisions in enterprise operation from data sources of host systems in producing locales, and the data can be loaded in data warehouse established in batch mode.Integration techniques and methods to obtain data facing subjects from distributed isomerous data surces on application service layer based on 3-tiers architecture pattern is studied in emphasizing.The primary researchs and works made by author of this paper are the followings:i) Designing and realizing data extractors for Oracle database and for driving of open database connectivity (ODBC) respectively on Linux and Windows environment;ii) The author proposes the methoddesigning and realizing a methed to map the relative original meta-data information from isomerous databases and describing them in Chinese, by which in database and give them Chinese semantics descriptions, by whichtranslating the database meta-data information from the technical abbreviations of fields for oriented database designprogrammers would be into Chinese semantic described descriptions oriented users easily to understand;iii) Proposing techniques and methods to decompose request of statistical data object into single tables that will be extracted from host satabase systems so that the multiple tables' relation operation and numerical calculating can be splited from host database system by which the sources consuming of host database system of producing locate would be reduced effectively;iv) realizing the relation operation algorithms on application service layer by programming, and generating data object suitable for statistic analysis by realizing of multiple tables' operation based on single tables;v) Proposing method automatically to match primary key with foreign key in terms of relative meta-data information of host database that might be hidden in multiple tables' relation operation on application service layer, the efficiency of multiple tables' joining can be enhanced effectively;vi) To guarantee the dataset extracted to response multiple tables' joining request un-empty, participating in designing and realizing a method of logical consequence for relationships among fields from multiple tables that would be selected before implementing request to obtain data, insuring to response un-empty data request.The ultimate purpose of this research in this paper is to design and realize "Automatic fare collection (AFC) operation management data analysis system of rail transit", which is checked and accepted by Shanghai Science and Technology Committee on July 12th, 2007, and registered and awarded the certificate of science and technology results (Registered No.: 9312007Y1168) on August 7th, 2007, and the certificate of computer software copyright (Registered No. : 2007SR13214) by the National Copyright Bureau on August 30th, 2007.
Keywords/Search Tags:3-tier architecture pattern, data extraction, optimization, relationship consequence, semantic mapping, data warehouse
PDF Full Text Request
Related items