Font Size: a A A

Research Of Semantic Similarity Algorithm Based On Ontology In Medical Domain

Posted on:2014-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:W Q LiFull Text:PDF
GTID:2268330401477615Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development of information science and computer technology, electronic medical data is increasing rapidly. Electronic medical records and a large number of medical and scientific documents has become an important data resource for clinical research. However, most of these data are stored as unprocessed and heterogeneous text format. Proper understanding of text data requires the integration of structured and heterogeneous clinical resources, medical records and scientific literature. The estimation of semantic similarity between concepts is an important component of understanding text data, which effectively promotes the process, classification and structuralization of these textual resources.How to determine the semantic similarity between words is the most important component of understanding text information. Semantic similarity has been successfully applied to many process of natural language, such as word sense disambiguation, document classification and clustering, the automatic spelling error detection and correction of words, ontology learning and information retrieval. In medical domain, the similarity computing can improve the performance of information retrieval of medical resources and effectively promote the integration of heterogeneous clinical data. Semantic similarity computes the likeness between words, which is the degree of taxonomical proximity. For example, bronchitis and flu are similar because both are disorders of the respiratory system. However, words can also be related in non-taxonomical way. For example, diuretics help in the treatment of hypertension, which is semantic relatedness. Semantic similarity and semantic relatedness are based on the evaluation of the semantic evidence observed in ontology or domain corpora. According to the type of domain knowledge exploited, different families of functions can be identified as semantic similarity algorithm based on the taxonomical structure of ontology, semantic similarity algorithm relying on the information content (IC) of concepts and semantic relatedness algorithm based on word contexts.This thesis firstly reviews and analyzes semantic similarity and relatedness algorithms usually used. Then each family of algorithms to identify their advantages and limitations under the dimensions of expected accuracy, computational complexity, dependency on knowledge sources are analyzed. Semantic similarity algorithm based on the taxonomical structure of ontology is not rely on specific corpus and pre-processing of data, while it presents some obstacles in its accuracy. New semantic similarity algorithms based on IC of concept are expressed by redefining these semantic similarity measures based on the taxonomical structure of ontology. These algorithms overcome some of the limitation of information content of corpus-based, and retain the effectiveness and scalability based on the ontology model, thereby improve the accuracy of assessment. In addition, the redefined similarity algorithms can be directly applied to the semantic environment and medical domain.After that, a new algorithm based on the taxonomical structure of a biomedical ontology is proposed. The simplicity of algorithms based on path is retained, and available taxonomic instances of the concept are considered. Since this algorithm is barely based on taxonomic structure, additional semantic instances are adopted without relay on data applicability and data pre-processing, which can provide accurate similarity assessment results between concepts. At the same time, it maintains a lower computational complexity and avoids some of the limitations of the algorithm based on path. Experiment uses SNOMED CT as the input ontology, the accuracy of our proposal is evaluated and compared against other algorithms according to a standard benchmark of medical terms.During the research, other semantic similarity algorithms based on ontology are analyzed, such as semantic similarity and relatedness algorithms based on attribute and hybrid semantic similarity and relatedness algorithms. In addition, the algorithms mentioned in this article are based on medical domian ontology or corpus, which can be evaluated and applicated in general domain ontology and corpus.
Keywords/Search Tags:semantic similarity, ontology, SNOMED CT, medicaldomain, information content
PDF Full Text Request
Related items