Font Size: a A A

Research Of Massive Electronic Medical Record Storage Method Based On Hadoop

Posted on:2017-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y GuoFull Text:PDF
GTID:2428330596457448Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the penetration of Internet technology in various fields,the medical based on internet has gradually become a part of people's lives.At the same time,the medical industry data are springing up from various aspects,electronic medical record(EMR),as the main carrier of modern medical information,plays a major role in the medical industry.But with the consistent growth in the size of the data,it has brought enormous challenge to the traditional medical information platform.Therefore,to build an efficient platform for massive,valuable medical data for storing and management,is an important part to promote wisdom medical development currently.Hadoop which is an open source framework provides a reliable and efficient distributed file system called HDFS.Its powerful data storage capacity,high throughput design is very suitable for large-scale data to store.As a result,it is the best choice for massive electronic medical records storage.But HDFS is for large files at the beginning of design.When dealing with small files,it performs a series of bottlenecks.Therefore,based on the analysis of correlational studies at home and abroad,fused with the feature of health data,this paper puts forward an optimized storage scheme to solve the storing problem of EMR small files.In order to realize the effective management of massive electronic medical records,in this paper,we implement a text clustering algorithm based on MapReduce,and combined with the Hash sampling and PAM algorithm,optimize the traditional K-means algorithm.Aiming at solving the problem of small files storage in HDFS,we propose a strategy which combines the small files based on the clustering results,and then stores in the HDFS.Meanwhile,to improve the searching efficiency of small files,in this paper,the prefetching and caching mechanism for small files is designed and implemented.Thereby it reduces the frequency of IO operation in HDFS,saves the consuming time for reading.Finally,the feasibility and effectiveness of the optimized storage scheme is verified by multigroup comparing experiments.Experimental results show that the optimized scheme of storage can effectively reduce the number of files to store,alleviate the memory pressure of NameNode,and improve the efficiency of reading small files in HFDS.It realizes the scheme which has a high efficiency of storage and management for massive electronic medical records.
Keywords/Search Tags:electronic medical record, small file storage, HDFS, text clustering, MapReduce
PDF Full Text Request
Related items