Font Size: a A A

The Design And Implementation Of Service Platform For Baidu Imaeg Distributed Index Building

Posted on:2014-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:S N YangFull Text:PDF
GTID:2268330422452005Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the data from the Internet showsexponential growth. On the one hand the technical for storage increases difficulty; onthe other hand, how to deal with PB level data is technical problem which theengineers must face. Traditional single-machine data processing, though veryconvenient, spents much longer time, which can not satisfy the current situation. Soit is necessary to use distributed environment for data processing. The use ofdistributed processing can deal with amounts of data, while at the same time we needto consider issues such as parallel programming.The service platform system is developed to solve the above problems, while theending goal is to satisfy the strategy research people to build a database for index. Tobuild a database for a large amount of data, the main solution is to use a distributedstorage and computing structure. In this system, the use of open source Hadoopframework solves the whole problems in process. Because hadoop has strongcompatibility and is easy for development, in the method, the large data is dividedand ruled, then combined with its own sorting mechanism. In the end, the frameworkcompletes the data orientation distribution and processing to produce the final indexfiles for users; For research and development personnel research of new sort strategyand building a distributed index needs on the same cluster, the system solution is toprovide users with a configurable file. By this file, users can customize theirprogram information to ensure that different users can be started in paralleldistributed style; to build a database for the user to understand the process; thesystem adopts a unified service platform interface. Users simply provide theappropriate building a database request file to complete the final index andtransmitted to the designated machine; in order to enable users to keep abreast of theprogress of building a database index, the system uses phased mail notification, thedetails of building a database make the building process more transparent.In this paper, we provide a reasonable solution to solve the above problems, thesystem uses C/C++and Shell language in Hadoop Distributed environment to realizethe distributed image retrieval building databases service platform.After testing, the result shows that the system platform is able to satisfymulti-users to build database service. The system is easier to maintained andupgraded because users use the unified service platform and just different strategy.
Keywords/Search Tags:Distributed, Index Building, big data, Parallel, Service platform
PDF Full Text Request
Related items