Font Size: a A A

Data Integration Services Design And Implementation Of Heterogeneous Employment

Posted on:2016-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2298330467493008Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the Internet has become the world’s largest and richest data source which contains a large number of job information resources. These resources include not only structure resources like traditional databases but also include semi-structured resources on a wide range of Web applications. However, due to the wide range of data sources heterogeneous and distributed, it’s very difficult for people to get a required employment information. To take full advantage of these resources and facilitate people’s queries, it need to integrate and access structured and semi-structured data on an unified platform. Integration of heterogeneous data is generated in this case.This paper investigates and summarizes representatives of structured and semi-structured data integration systems from the perspective of employment data integration services. For structured data integration issues, this paper uses existing grid technology and middleware integration thoughts and uses grid middleware OGSA-DAI to implement structured data integrated subsystem and solves the problem of heterogeneous databases dynamically updated information. For semi-structured data integration issues, this paper designs a semi-structured data integrated subsystem according to the characteristics of employment sites based on the visual page segmentation algorithm and improves the shortcomings of traditional web extraction systems.Employment data integration in this paper is divided into two parts: structured employment data integration and semi-structured employment data integration. The structured employment data integration subsystem uses XML as the common metadata standards, maps employment information data to metadata and achieves unified heterogeneous data storage and query. The structured employment data integration subsystem uses OGSA-DAI middleware to register data sources, query data, manage metadata and update data which effectively shields the differences between databases and realizes structure data integration. Semi-structured employment data integration subsystem preprocesses web page and generates visual tree firstly. Secondly, use VIPS algorithm to locate the position of employment information on the web page, and establish the employment information extraction template through manual configuration. Finally, extract employment information on page using XPath.This paper presents a service system in the field of employment data integration and achieves structured and semi-structured data integration. Build a prototype of an integrated system of employment data and experimental results show that the system design is feasible.
Keywords/Search Tags:Heterogeneous data integration, OGSA-DAI, Webinformation extraction, VIPS algorithm
PDF Full Text Request
Related items