Font Size: a A A

The Design And Implementation Of The Full-text Indexing Module Based On MapReduce Processing

Posted on:2012-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F DuFull Text:PDF
GTID:2178330332975992Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet and digitalization, computer data produced by users and digital systems showed an explosive growth trend. In the meantime, with the rapid development of web technologies, the way how software services are provided is transforming from traditional stand-alone model to centralized model. Another significant trend is that unstructured data is growing rapidly and becoming more importantly in our daily lives. Under these trends, efficiently processing and managing massive unstructured data in centralized way becomes an urgent need.MapReduce is a programming model proposed by Google, which is used for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines and are highly scalable. MapReduce hides the complexity of distributed computing and makes developing distributed programs easily and efficiently.In this paper, the module of using the popular MapReduce framework to accomplish massive unstructured data parsing, information extraction and full-text index building has been built. This module could fully utilize the processing capacity of the computer cluster. Besides this, thorough performance testing and performance tuning for the module has also been done. After this work, the performance of the module has been improved tremendously. The performance tuning strategy under certain conditions has also been concluded. In the end, an application beyond this module has been built to show the module's power for supporting upper layer application development.
Keywords/Search Tags:unstructured data, MapReduce, information extraction, full-text index, performance tuning
PDF Full Text Request
Related items