Medical literature, such as medical health records are increasingly digitised. As with any large growth of digital data, methods must be developed to manage data as well as to extract any important information. Information Retrieval (IR) techniques, for instance search engines, provide an intuitive medium in locating important information among large volumes of data. With more and more patient records being digitised, the use of search engines in a healthcare setting provides a highly promising method for efficiently overcoming the problem of information overload.;Traditional IR approaches often perform retrieval based solely using term frequency counts, known as a 'bag-of-words' approach. While these approaches are effective in certain settings they fail to account for more complex semantic relationships that are often more prevalent in medical literature such as negation (e.g. 'absence of palpitations'), temporality (e.g. 'previous admission for fracture') or attribution (e.g. 'Father is diabetic'), or even term dependencies ("colon cancer"). Furthermore, the high level of linguistic variation and synonymy found in clinical reports gives rise to issues of vocabulary mismatch whereby concepts in a document and query may be the same, however given differences in their textual representation relevant documents are missed e.g. hypertension and HNT. Given the high cost associated with errors in the medical domain, precise retrieval and reduction of errors is imperative.;Given the growing number of shared tasks in the domain of Clinical Natural Language Processing (NLP), this thesis investigates how best to integrate Clinical NLP technologies into a Clinical Information Retrieval workflow in order to enhance the search engine experience of healthcare professionals. To determine this we apply three current directions in Clinical NLP research to the retrieval task. First, we integrate a Medical Entity Recognition system, developed and evaluated on I2B2 datasets, achieving an f-score of 0.85. The second technique clarifies the Assertion Status of medical conditions by determining who is the actual experiencer of the medical condition in the report, its negation and its temporality. Standalone evaluations on I2B2 datasets, have seen the system achieve a micro f-score of 0.91. The final NLP technique applied is that of Concept Normalisation, whereby textual concepts are mapped to concepts in an ontology in order to avoid problems of vocabulary mismatch. While evaluation scores on the CLEF evaluation corpus are 0.509, this concept normalisation approach is shown in the thesis to be the most effective NLP approach of the three explored in aiding Clinical IR performance. |