Font Size: a A A

Reasearch On Biomedical Information And Annotation Retrieval

Posted on:2016-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:T W ZhongFull Text:PDF
GTID:2308330473955932Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of biomedical technology, the number of biomedical literature is also growing fast. For example, Medline database contains more than 20 million of biomedical literature. Especially in 2015, medical literature is up to 24 million.For researchers, we can image that it is very difficult to search information from a huge amount of literature. In addition the particularity of biomedical literature increases the difficulty of retrieval. For example, Medical articles often contain professional words and abbreviations, which makes the general retrieval methods do not apply to the field of medical text retrieval.We focus on three noise control strategies for UMLS based on query expansion and Multi-label to improve the performance of the biomedical retrieval. Firstly, we introduce the development of biomedical information retrieval. Then We learn the basic theory of retrieval and multi-label. We control the noise of query expansion to improve the performance of information retrieval. With the knowledge of multi-label, we extract mesh terms from the biomedical literature, which can be used in query expansion.The main contributions of this paper include:First, analysis of the different conclusions in the use of ontology based query expansion method, we point out query expansion is not ideal because of the noise introduced in the query expansion process.Second, to suppress the noise of expansion terms, we present three methods. On the basis of phrase, we present three retrieval models, including Word Model, Phrase Model and Combine Model. The experiment results show that our methods of noise control are very effective. They can significantly improve the performance of retrieval.Third, in the field of Multi-Label we research the CCA algorithm. We present CCA method to predict the labels of unknown samples. The results show that the CCA method obtains good results on the multiple performance indicators. Furthermore, with the help of CCA, we extract the MeSH from document and query, which can further improve the performance of retrieval.
Keywords/Search Tags:Query Expansion, Noise Control, MeSH, CCA, Multi-label Learning
PDF Full Text Request
Related items