Font Size: a A A

Research On Biology Collaboration: Scientific Software Sharing, Selection And Recommendation

Posted on:2015-11-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:1108330464455367Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The explosion of biology data makes biology research become compute and data intensive. It requires a mechanism of sharing and coordinated use of diverse resource to solve the sophisticated research problems. With the rise of eScience and Collaboratory, scientific workflow has become the main forms of collaboration in biology research domain. However, with the scale of collaboration becoming larger, some new challenges arise. First, data analysis flow gets more and more complicated, which makes it harder to understand the procedure and validate the results. Second, increasing computing demand needs more software. Although software sharing is a good way to relieve this pressure, the existing research mainly focuses on how to share software. Less attention is paid on sharing itself. Third, software and data sharing provide a large mount of scientific resource, which also brings a new problem that how to locate the useful resource quickly. To solve the problems mentioned above, this paper does some research in relevant domain. The main work and contribution are listed below:1) A collaborative provenance model designed for biological scientific workflow. The model documents the whole procedure of workflow execution, including involved data, software, user, task and the relations between these elements. On this basis, the implied relations like data dependency, run dependency, are discussed in detail. And the model introduces a collaboration relation to record users’ cooperation on data and software. With the collaborative provenance model, users can make a better understanding of workflow, review executions, and analyze the collaborations.2) A scientific software sharing model. We apply participatory observation and semi-structured interview to collect information about scientific software sharing. After the induction and deduction on collected data, we find out that, the categories software belong to, the range of sharing, the stages of software life cycle and the software’s technical attributes all have effects on sharing activities. So, there is need for sharing strategies to protect sharers’ and users’ rights. And then according to the discoveries, a sharing model is proposed with automata theory, which can guide the scientific software sharing.3) A social feature based scientific software selection and composition solution. We apply ethnography to study use of software. Biology researchers emphasize software’s social features more than technical ones. And then users’ mentoring relationship, developer, software’s academic level and reputation are uncovered to compose the quality model. To quantify the model, we make use of the collaborative provenance model and develop the selection algorithm. On this basis, according to the cooperation between software, a software composition algorithm is designed. The experimental data shows that this solution can improve the efficiency of software selection and composition.4) A user trust-based biology data file recommendation solution. As one of main research methods, sequence alignment is very dependent on the quality of reference data files. While, biology data explosion make it more difficult for user to find trusted data files. In this paper, we study the data files usages information stored in collaborative provenance model to get the users’ preference, which is defined as data trust. And then, according to the similarity in behavior, user trust is also computed. Based on these two kinds of trusts, a recommendation algorithm is designed, which is used to predict the trust for data files and filter out reliable data files. The experimental data shows that, the recommended data files can increase the success rate of data analyses.5) Designing and developing a protein data analysis platform. The platform integrate the research results above, including provenance management, scientific software sharing mechanism, social features based software selection and composition methods, and efficient data file recommendation technique. It can provide support to enhance the collaboration in biology research.
Keywords/Search Tags:Computer Supported Cooperative Work, Collaborative Computing, Software Selection, Software Sharing, Data Recommendation
PDF Full Text Request
Related items