Font Size: a A A

Research And Implementation On Etl Model Of Resources Integration Based On Web Metadata Extraction

Posted on:2011-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:K S HuFull Text:PDF
GTID:2198330332984951Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Digital resource integration is one of the most important things in the area of digital library construction. With the growing digital resources, librarians have faced the difficulties in the resources management and more workload. Users of digital resources also felt the inconvenience in the face of the tremendous amount of use. Users expect to quickly and easily retrieve all the distributed resources through a unified interface, including electronic books, electronic journals and other academic resources, and abstract titles, patents, accomplishments and other online teaching resources, conference proceedings and other forms of diverse number of resources, and be provided with one-stop information services. This is the resource integration to solve the problem.Relying on the "digital library" projects of Hunan Normal University, and after in-depth study of ETL model, this paper gives the resource integration model based on WEB metadata extraction.Papers focusing on digital library metadata integration solutions, firstly briefly reviewed the current research situation; Then analysed the contents and patterns of its, especially discussed the integration mode based on metadata warehouse resources. With the ETL extraction model, this paper introduced the relevant Web metadata extraction technology in details, including HTML, XHTML, XML, DOM, JAXP etc; And then analysed the WEB information extraction process, divided into sample extraction and rules of the page, HTML page cleaning, dealing with noise, DOM tree parsing, conversion from XML to database. Combined with the technologies of XML, JTidy, DOM, JAXP and related tools such as JDK Eclipse, and SQL Server, Tomcat, the author put forward a wrapper which can extract metadata from the WEB, finally realize the resource integration based on the metadata.Through this resource integration platform, can effectively promote the realization of digital resources, different types of resources, thus keeping the unified visit knowledge system, improve the comprehensive and relevance of digital resource utilization and readers' retrieval efficiency.We hope this research work, the integration of digital library resources, can bring some useful hints to construction and development of its fields, and can provide some relevant domestic agencies and the reference effect.
Keywords/Search Tags:Digital Library, Resource Integration, ETL, Metadata
PDF Full Text Request
Related items