OA Journal Site Resource Extraction And Storage Methods

Posted on:2015-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhang

Full Text:PDF

GTID:2298330422470668

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the growth of OA (Open Access) journals on the Internet, the OA islandproblem becomes more evident, and it has restricted the effective use of OA resources.One way to solve this problem is online integration. With this, how to effectively extractOA journal resources on the Internet, and how to achieve them are two of the core issues.Based on a comprehensive analysis of domestic and foreign research, the paper conductedin-depth research on the OA journals resource extraction and storage issues.Firstly, the paper introduced the general knowledge and methods on Web informationextraction, as well as the architecture of Hadoop distributed file systems and distributedcomputing framework, how they work and how to use them.Secondly, the lack of traditional OA journals reptile site and page structureknowledge, comprehensiveness and accuracy of the OA journals poor resource extractionsites and other defects, this paper presents OA journals site resource extraction method,and two templates to generate a template-based method, and proposed a framework forOA resource extraction sites for OA journals on these foundations.Thirdly, Hadoop cannot store the OA resource well, the paper presents a merge-basedstorage method and a multiple index method. It controls the number of files in HDFS bycombining small files, and then reduces the memory of key node NameNode. Indexes canbe built for different attributes, so that improves the query speed.Finally, the paper tested and analyzed the methods, and looked ahead at furtherresearches.

Keywords/Search Tags:

OA resource, Extract, Small files, Hadoop, Index

PDF Full Text Request

Related items

1	Research And Optimization Of Small Files Processing Techniques In Hadoop
2	Research On Processing Techniques Of Massive Small Files Based On Hadoop
3	Research And Implementation Of Small Files Storage Management Based On Hadoop
4	Study On Processing Of Massive Small Files Based On Hadoop
5	The Research And Implementation Of Method Regarding To The Small Files Problem Of Hadoop
6	Research And Application Of Small File Storage Technology Of Massive Animation Resources
7	Research And Implementation Of Small File Processing Techniques In Hadoop
8	Optimization Study On Storing Massive Small Files Based On Hadoop
9	Design And Implementation Of The Key Techniques For Storing And Retrieving Massive Small Files In Hadoop
10	Research On Access Optimization Of Small Files In Hadoop Cluster