Font Size: a A A

Design And Implementation Of Hadoop Architecture Full-Textsearch Engine Based On Virtualisation Technology

Posted on:2014-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:J PanFull Text:PDF
GTID:2268330425495323Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the accelerating pace of social informatization, informatization of government department was continuously moving forward as well. With the expansion of numbers of users, the data volume also boosted exponentially, which dramatically reduced the database full-text search capability of information system. Under this circumstance, we urgently need an operational stable, economical feasible and readily manageable storage&computing platform, which could achieve the minimal operation cost to complete intensive parallel tasks on data processing and computing. This thesis hereby, discussed how to rapidly and efficiently realise full-text search engine based on virtualisation technology to resolve questions above.During the mega data processing to conventional full-text based database, bottleneck existed on both query and processing aspects, with the target problem could get resolved through distributed query engine system. Currently, the mainstream Haloop-based distributed query engine could resolve this in terms of processing efficiency, however, may still have problems on hardware computing utilisation ratio, feasibility on deployment and convenience on hardware maintenance, those aspects need also to be resolved through introducing virtualisation technology. With those objectives in notion, the author completed the following works as below:(1) Analysis of components and functionlisation of hardware-assisted total virtualisation technology and platform, plus other alternative tools for virtualisations. Analysis of Haloop-related distributed file system together with designing objectives, structural components and operation routes of MapReduce module, followed with analysis of other technical tools which may use Haloop system as well.(2)Establishing distributed query engine and Chinese characters segmentation methodology, optimising query scheduling processing and improving system query efficiency. (3)Effective utilisation of multi-core CPU-based hardware devices through virtualisation technology, making system readily used for deployment, management scheduling and maintenance.(4)Through visualisation technology building the master nodes and backup nodes of distributed query engine, providing viable total solution to single node invalidity of distributed query engine, offering higher availability to private cloud platform.This thesis analysed technologies related to virtualisation and hadoop architecture, which identified that integration of virtualisation technology and Haloop-based distributed query engine could better used therefore achieved the multi-dimensional demands on construction and application of our datacenter. Upon implementation, verified that virtualisation and Haloop technology-based applicable methodology could capable of providing cross hardware platform for safety and efficency realisation of full-text search engine, while resolving aspects of deficiency on single node hidden faulty from distributed query engine, inefficient utilisation of computing resources, and difficulties to put structural construction into practice.
Keywords/Search Tags:Hadoop, virtualization, distribute, full-text search
PDF Full Text Request
Related items