Research Of Massive Electronic Medical Record Storage Method Based On Hadoop

Posted on:2017-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Guo

Full Text:PDF

GTID:2428330596457448

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the penetration of Internet technology in various fields,the medical based on internet has gradually become a part of people's lives.At the same time,the medical industry data are springing up from various aspects,electronic medical record(EMR),as the main carrier of modern medical information,plays a major role in the medical industry.But with the consistent growth in the size of the data,it has brought enormous challenge to the traditional medical information platform.Therefore,to build an efficient platform for massive,valuable medical data for storing and management,is an important part to promote wisdom medical development currently.Hadoop which is an open source framework provides a reliable and efficient distributed file system called HDFS.Its powerful data storage capacity,high throughput design is very suitable for large-scale data to store.As a result,it is the best choice for massive electronic medical records storage.But HDFS is for large files at the beginning of design.When dealing with small files,it performs a series of bottlenecks.Therefore,based on the analysis of correlational studies at home and abroad,fused with the feature of health data,this paper puts forward an optimized storage scheme to solve the storing problem of EMR small files.In order to realize the effective management of massive electronic medical records,in this paper,we implement a text clustering algorithm based on MapReduce,and combined with the Hash sampling and PAM algorithm,optimize the traditional K-means algorithm.Aiming at solving the problem of small files storage in HDFS,we propose a strategy which combines the small files based on the clustering results,and then stores in the HDFS.Meanwhile,to improve the searching efficiency of small files,in this paper,the prefetching and caching mechanism for small files is designed and implemented.Thereby it reduces the frequency of IO operation in HDFS,saves the consuming time for reading.Finally,the feasibility and effectiveness of the optimized storage scheme is verified by multigroup comparing experiments.Experimental results show that the optimized scheme of storage can effectively reduce the number of files to store,alleviate the memory pressure of NameNode,and improve the efficiency of reading small files in HFDS.It realizes the scheme which has a high efficiency of storage and management for massive electronic medical records.

Keywords/Search Tags:

electronic medical record, small file storage, HDFS, text clustering, MapReduce

PDF Full Text Request

Related items

1	Research And Implementation Of Small File Optimization Storage Management System Based On HDFS
2	Research And Application Of The Optimization Strategy Of File Storage And Reading Based On HDFS
3	Research On Key Technology Of Small File Storage Based On HDFS
4	Optimization And Implementation Of Small File Storage In HDFS Under Hadoop Platform
5	Improvement Of HDFS Small File Storage Based On Har
6	Research And Implementation Of Small File Storage Model Based On HDFS
7	Research On Efficient Storage Of Small Files In Mobile Ultrasound Detection Based On HDFS
8	Research And Design Of Multi-Tenant Small File Storage System Based On HDFS
9	High-performance File Storage And Management System Based On HDFS
10	Research On Storage And Access Startegy Of Massive Small Files On HDFS