Font Size: a A A

Exploring the data work organization of the gene ontology

Posted on:2015-10-12Degree:Ph.DType:Dissertation
University:The Florida State UniversityCandidate:Wu, ShuhengFull Text:PDF
GTID:1478390017492837Subject:Information Science
Abstract/Summary:
The advent of high-throughput techniques has led to exponential increase in the size of biological data encoded in various formats and stored in different databases. This has posed challenges for biological scientists to retrieve, use, analyze, and integrate data. To meet the urgent need of organizing a massive amount of heterogeneous data, there has been a trend towards the development of bio-ontologies. Among many current bio-ontologies, the Gene Ontology (GO) is one of the most successful and has been widely used across different biological communities. This study applied Activity Theory and Stvilia's Information Quality Assessment Framework to examine the infrastructure supporting the development, maintenance, and use of the GO among different biological communities. Employing the netnographic approach, this study gathered data in a natural setting via archival data analysis, participant observations, and qualitative semi-structured interviews.;The findings indicated that the GO was collaboratively developed and maintained by a consortium of biological communities, mainly model organism databases. Representatives from each of the GO Consortium member were assigned the role of GO curators and formed into groups working on different aspects of the GO. The division of labor within the GO Consortium ensures that the formidable ontology development process can be divided into manageable projects. The GO Consortium consists of not only biocurators but also software engineers and bioinformaticians, providing technical and software support. As an open community, the GO Consortium has been bringing in new groups and welcomes any individuals to submit content for inclusion in the GO database. GO's collaborative development approach can be adopted by other similar ontologies or large-scale sociotechnical systems.;This study also provided a rich description of GO's data quality work and a conceptualization of GO's data quality structure, including a typology of GO's data quality problems and corresponding quality assurance actions. This knowledge base can be used for the design and management of similar sociotechnical systems and the development of best practices for knowledge organization system curation in molecular biology and biomedicine. The data curation skills that were perceived important for the GO can not only inform the training of biocurators, but also give new insight into the curriculum design and training in Library and Information Science and Data Science. The findings of this study can benefit the GO by identifying various data quality issues and contradictions in its data curation work as well as suggesting strategies and actions for improvement. Future research includes developing quantitative models for assessing the quality of different aspects of GO's data curation work. Netnographic studies can be conducted with different groups and teams within the GO Consortium to investigate their data practices and collaboration patterns, which can inform the design of support repertoires for scientific teams.
Keywords/Search Tags:Data, GO consortium, Work, Biological
Related items