Font Size: a A A

Developing and Implementing Techniques to Harvest Surveillance Information from Existing Veterinary Diagnostic Laboratory Data

Posted on:2014-07-30Degree:Ph.DType:Dissertation
University:University of Prince Edward Island (Canada)Candidate:Dorea, Fernanda CFull Text:PDF
GTID:1458390005498488Subject:Biology
Abstract/Summary:
Syndromic surveillance is a tool for continuous, automated extraction of surveillance information from health data sources. The research documented in this dissertation aimed at exploring informatics and data mining tools in order to develop and implement techniques to harvest additional surveillance information from existing diagnostic laboratory data. Data concerning laboratory test requests for diagnosis in cattle were provided by the Animal Health Laboratory (AHL), at the University of Guelph, Ontario. A thorough review of the initiatives of syndromic surveillance in animal health was conducted. Documented difficulties regarding the acquisition of clinical data, and especially sustainability of systems based on voluntary participation of veterinarians or data providers in scattered locations, resulted in the choice of using laboratory data in this research. Automated methods to classify laboratory submission data into clinical syndromes were investigated. One of the challenges of working with laboratory data was determining how to transform diagnostic data into epidemiological information. The most time-consuming step of classification was the creation of a dictionary of keywords relevant to each classification task, and the definition of the relationship between these words, their co-occurrences and the target syndromic group. Once defined, however, these relationships were easily translated into a set of rules that achieved high classification performance. After classification was performed, the data were reduced to multiple time-series registering daily (or weekly) submissions to the different syndromes monitored. Retrospective evaluation of the time-series representing daily counts for each syndromic group were carried out in order to identify temporal effects present, and define methods to model or remove them on-line. A method is presented for automated removal of excessive noise and historical outbreaks in historical data, in order to construct baselines of normal behaviour. These baselines could be used as training data for the algorithms implemented in the next stages. Lastly, the prospective phases of system development were carried out, that is, the analyses which scan the time series in an on-line process, one day at a time, in order to detect temporal aberrations in comparison to a baseline of historical data. Several aberration detection algorithms were evaluated. Upon the conclusion that no single algorithm was superior in all outbreak scenarios, a scoring system to combine algorithms was developed. All steps were set up using open source software, and delivered to the data provider as a simple desktop application scheduled to run daily in an automated manner. Fast development and simple maintenance is expected to lead to incorporation of this system into the routine of the data, becoming an indispensable tool for diagnosticians and epidemiologists, and encouraging further technical development.
Keywords/Search Tags:Data, Surveillance information, Diagnostic, Automated
Related items