Richard McClatchey Richard.Mcclatchey@uwe.ac.uk
Academic Specialist - CATE
Data provenance tracking as the basis for a biomedical virtual research environment
McClatchey, Richard
Authors
Abstract
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community.
Journal Article Type | Article |
---|---|
Conference Name | International Symposium on Grids and Clouds 2017 (ISGC 2017) |
Start Date | Mar 5, 2017 |
End Date | Mar 10, 2017 |
Acceptance Date | Oct 17, 2017 |
Online Publication Date | Dec 6, 2017 |
Publication Date | Jan 1, 2018 |
Deposit Date | Oct 20, 2017 |
Publicly Available Date | Oct 20, 2017 |
Journal | Proceedings of Science |
Print ISSN | 1824-8039 |
Peer Reviewed | Peer Reviewed |
Volume | 293 |
Keywords | data provenance tracking, biomedical, virtual research environment |
Public URL | https://uwe-repository.worktribe.com/output/872785 |
Publisher URL | https://pos.sissa.it/ |
Additional Information | Title of Conference or Conference Proceedings : International Symposium on Grids and Clouds 2017 (ISGC 2017) |
Contract Date | Oct 20, 2017 |
Files
ISGC 2017 long.pdf
(1 Mb)
PDF
You might also like
Position paper: Provenance data visualisation for neuroimaging analysis
(2014)
Presentation / Conference Contribution
Scientific workflow repeatability through cloud-aware provenance
(2014)
Presentation / Conference Contribution
Data management challenges in paediatric information systems
(2014)
Book Chapter
CRISTAL-ISE: Provenance applied in industry
(2014)
Presentation / Conference Contribution
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search