Font Size: a A A

Research Of Medical Data Processing Technology Based On Cloud Computing

Posted on:2016-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q WeiFull Text:PDF
GTID:2308330479455437Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and widespread application of information technology, information processing in medical industry continues accelerating, and medical data increases geometrically. By 2020, medical data will grow to 35 ZB, which is the equivalent of 44 times in 2009. Vast amounts of medical data and complex data types bring great pressure to the entire health care industry on data storage and information processing. At the same time, with more and more attention have been paid to medical data, how to store and process vast amounts of medical data efficiently and how to provide data service and data support for doctors and patients have become urgent problems. The emergence of cloud computing gives a new way to process massive medical data.As an important part of cloud computing, Hadoop, which is an open source framework, provides a platform for storing and calculating massive medical data distributively. Aiming at the problems which consist in the processing and analyzing of the massive medical data, the main work of this paper includes the following aspects:Firstly, aiming at the memory bottleneck and low efficiency of documents retrieval problems, this paper proposes a new method, which is suitable for storing a large number of small medical files, by researching on HDFS and MapReduce, which are the core components of Hadoop cloud platform. By introducing the document pre-processing module, small files will be merged into a sequence file and its corresponding information will be written into an extended index, which can effectively reduce the number of files storing in the clusters so as to improve the memory usage of the clusters. And by using the extended index, the speed of file retrieval has been improved effectively, in the case of ensuring users’ privacy and positioning files accurately. Experiments have shown that our method can effectively solve the problems in Hadoop when storing large number of small files.Secondly, aiming at the problems that the intermediate results are too large and the scanning time is too long, this paper improves Apriori algorithm and transplants it into the Hadoop platform, by researching the Apriori algorithm and analyzing the relations of medical data. According to the thought of mapping and statute, this paper proposes a new method of digital mapping and sequencing Apriori algorithm, which is convenient for data transferring and itemset matching. This paper uses base model and generation model to generate a superset, which can improve the efficiency of superset generation, while it can improve the efficiency of pruning. Transplanting the improved Apriori algorithm into MapReduce framework can make it adapt to the high concurrency environment. Experiments show that, after transplant, the Apriori algorithm has a good capability of parallel extension.Thirdly, combining the storage of small medical files and the analysis of medical data, this paper designs and implements a medical data storage and analysis system based on Hadoop, and introduces the main function modules. This paper introduces the process of building Hadoop platform in detail, which ensures the implementation of the system, which provides several functions, such as file uploading, file searching, and providing user interface to analyzing the association between diseases. Verification has been made through user interface, which proves the reliability of the system.
Keywords/Search Tags:Medical Data, Cloud Computing, Hadoop, Small File Storage, Apriori Algorithm
PDF Full Text Request
Related items