Poster Abstract

P6.3 Joseph Bochenek (University of Cape Town )

Theme: Local and global cloud infrastructure for processing and storage

Infrastructure for Scalable and Reproducible Analysis

Systems for developing scientific software and performing data analysis have evolved drastically in the last decade. In the same timeframe, both public and the scientific communities have become aware of a crisis in scientific replicability in every field of science, including astronomy. As a result, computing facilities at research institutions are evolving from traditional cluster or single node environments to flexible cloud computing systems. In addition, a wide array of new applications and tools are changing the workflow of scientific analysis. This presentation will address innovations in infrastructure and applications that make it easier to conduct transparent, sharable, replicable analysis in astronomy, while simultaneously making it easier to scale analyses to very large datasets. We describe a computing framework that makes use of notebooks, containerization, cloud infrastructure, and utilizes a set of custom solutions that bridge the gaps between existing applications. Our goal is to provide a seamless, interactive environment where astronomers can develop scientific analyses, scaling from a small dataset on a laptop, to petabyte-scale datasets processed on a large cluster. We also describe a stack of tools to archive and publish each layer of the analysis: including datasets, workflow, environment, and code. We will lastly explore the state of reproducibility and transparency in radio astronomy, and how this relates to the modern systems and infrastructure that are used to process and analyze large datasets.