Font Size: a A A

Research On Content Extraction And Visualization Of TCM Medical Records

Posted on:2022-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F MaFull Text:PDF
GTID:2504306485486194Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous accumulation of domain unstructured texts,sophisticated text mining techniques and tools are needed in various application fields,which has promoted the rapid development of text mining technology.In the field of TCM medicine,TCM medical records are important information sources,which contain a lot of clinical experience and has great research significance for clinical diagnosis and research.However,TCM medical records are not only highly professional,but also expressed in short sentences in grammar,which brings great challenges to the acquisition of TCM knowledge extraction.The NKI research group owns a framework of semantic taxonomy and description framework(FSTD)to help domain knowledge acquisition and common sense acquisition.In the research on acquiring knowledge of TCM medical records,the research group compiled a set of grammar rules(called semantic grammar Gseed)based on this framework,and implemented a TCM medical record parser.However,the parser relies heavily on the quantity and quality of the semantic grammar.If some grammatical rules are missing in the semantic grammar,some sentences may not be parsed,and therefore some domain knowledge may be missed.Aiming at the above problems,this paper optimizes the TCM medical record parser from the aspects of semantic grammar extension,and visualizes the parsing results.The main achievements of this paper are as follows.1.Proposing an automatic method of extending a domain semantic grammar based on the Earley fault-tolerant parsing algorithm.This paper optimizes the parser from the aspect of perfecting semantic grammar,and implements an automatic extension method of domain semantic grammar,including the extension of words and grammar productions,so as to automatically generate new semantic grammar rules,which can help improve the coverage of semantic grammar and parse text more accurately.Finally,through experiments,it is proved that this method can provide a large number of effective semantic grammars for manual improvement.2.Proposing a visualization method for extraction of text content of TCM medical records.The TCM medical record parser used in this paper is based on semantic grammar,it uses the Earley parsing algorithm to parse the records,and finally obtains the domain parse trees containing knowledge content.But these parse trees are not friendly to non-computer professionals.In order to integrate the extracted domain knowledge into a complete event description,this paper implements a visualization method in the basis of traditional TCM medicine text extraction,and the content parsing trees are automatically into the domain knowledge model as the model of knowledge description.This method is general-purpose and can be used to meet different application scenarios by changing the domain knowledge model.3.Designing and implementing a prototype based on the methods.The system consists of two parts:semantic grammar extension,and knowledge visualization.With updated extension of the semantic grammar,this system has been set up the results of the extension in proofreading,format check,feedback functions and semantic grammar extension,by using human intervention and automatic learning,so as to reduce the cost of human extension of the semantic grammar.Aiming at the visualization of extracted knowledge,the system provides an extraction interface with TCM medical records as input and the extraction results in the form of domain knowledge model as output for third-party applications,and provides a page display for doctors.
Keywords/Search Tags:Information extraction, Knowledge acquisition, Semantic grammar learning, Error-tolerant Earley parsing, Knowledge visualization
PDF Full Text Request
Related items