Font Size: a A A

Design And Implementation Of Book Collection And Storage Operating System Based On Hadoop

Posted on:2017-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhangFull Text:PDF
GTID:2348330512951081Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,Web2.0 model in the sudden emergence of the Internet world.The user's role is no longer a single page viewers,but also producers of web content.With social networking more and more deeply into the user's life,at home and abroad a variety of social networking platforms have emerged,but the field in the book has not yet appeared mature social networking platform.Currently,building a book reading and social management platform is facing many challenges.First,the volume of books and data variety and Network book information varies greatly.We need to find an efficient method of crawling book information from the network and integrate this information categorized books.Secondly,the data stored information is huge of basic information about the book and the book reviews.How to store huge amount of data and ensure data access speed and system scalability is worthy of further study.Finally,this is a key part of the system implementation that how to build a social platform based on book information that allows users to create their own books as a link Books circles.This article is designed and implemented the application system about the book collection and storage based on Hadoop.It provides the data foundation and background support for the book reader and management of social platform.The main work has been completed as follows:(1)In terms of the acquisition of books,in-depth study of the theoretical basis of web crawler technology.URL structure for the acquisition of books,web denoising,book information collection rules,data cleansing,information extraction execution Books,books,and other aspects of integration were designed and implemented.(2)In terms of library management,to build a distributed file system HDFS Hadoop is based.In the study to understand the basis of theoretical knowledge on the Hadoop platform,set up a Hadoop cluster environment,and distributed file storage and was designed to achieve.(3)In terms of Books Social,on the basis of the needs of users carry out preliminary investigation and analysis,design and implementation of individual study management functions,providing user management,study,friends like.
Keywords/Search Tags:Web crawlers, Jsoup parser, Hadoop, Web information extraction
PDF Full Text Request
Related items