| With the continuous development of hospital information construction process, the various types of clinical information systems for hospitals has accumulated rich clinical data resources. The massive medical activity information contained in the clinical data is not only the first-hand information on medical treatment, teaching, scientific research, but also the comprehensive evaluation basis on medical quality, technical level, management level and etc. If we want to analyze and summarize the clinical data, the narrative medical text data need to be processed firstly.Traditionally, the analysis of the narrative medical text data is mainly rely on manual processing. Doctors, researchers and administrators extract the information they need by browsing and reading such medical text data. But in the current background of medical big data, it is more difficult to obtain the information we need from the exponentially increasing clinical data. At the same time, large amounts of unstructured medical text data has become obstacles to information sharing between hospitals. Therefore, it has great significance for the structuring processing research of unstructured clinical data.The existing medical structuring process can be divided into two kinds: the Before Structuring Process(BSP) is mainly about designing normal medical record systems and the After Structuring Process(ASP) is mainly about using natural language processing technologies. To take full advantage of existing historical clinical data resources, this paper combined the characteristics of clinical text data with the techniques of rule extracting, text clustering, statistical analysis and etc., and designed and implemented a complete medical text data ASP system, which supports the automatic conversion from unstructured medical text data to structured data.Firstly, this paper took the gross examination pathological text data in pathological report of clinical document as an example, summarized the hierarchy and writing features of the pathological text data, and designed the overall structuring process. On this basis, this paper designed the overall framework of the clinical document structuring processing system, introduced its three core modules: clinical document data preprocessing module, pathological sample description template extracting module and clinical document immediate structuring module, and described the main functions and tasks of each module in detail.Then in order to solve the problem of pathological sample description template extraction, this paper established a lexicon of pathological sample names, and proposed a rule-based indicator extracting algorithm, which screens the indicators from pathological text with the help of lexicons, POS, word location in its snippet and other information. On this basis, combining with a custom text similarity calculation method, this paper proposed a clustering algorithm based on dictionary, which could be used to determine the initial extracting range of each pathological sample description template. The final pathological sample description templates would be obtained through the screening of two statistical parameters: IDF and C-value.Finally, the new clinical text data would be structured immediately by applying the existing pathological sample description templates, achieving the goal of real-time structuring. At the same time, the system provided the function of feedback optimizing, and would be optimized by the way of modifying lexicons, adding data to library, modifying rules and thresholds of parameters, or even modifying templates.In addition, the clinical document structuring processing system proposed by this paper adopted B/S architecture, and Web technologies were used to achieve a user-oriented operation interface, from which users could easily operate the system via pages to train templates or process data.To verify the availability of the proposed structuring method, this paper tested on real data sets. Experiments show that, the average structured accuracy of each clinical text data reached 82.8% with the process of the proposed structuring processing method, and comparative experiments proved the effectiveness of this method again. |