New data system feeds a powerful model while serving as a research tool in its own right.
Increasingly complex human-Earth system models have been augmenting complex data requirements, to the point that stand-alone software systems are required to track and assemble these data inputs to the main model. A new data system known as “gcamdata” was developed for the Global Change Assessment Model (GCAM) to provide a robust, reproducible, and transparent system to track and prepare hundreds of model inputs and enable researchers to easily construct alternative scenarios for research.
While this new data system was made specifically for the GCAM model, many of its components and approaches to processing are broadly applicable to, and reusable by, other complex model/data systems aiming to improve transparency, reproducibility, and flexibility. As open-source software with flexible architecture, gcamdata introduces a new way to handle and prepare data to feed complex global models. This saves researchers time and effort, improves traceability and reproducibility, and enables exploratory “what-if” analyses using GCAM.
Modern, integrated human-Earth system models are complex and require correspondingly detailed input datasets. These models are sophisticated attempts to quantify relationships between environmental, social, and economic factors. This new data system software offers clear and easy-to-use application to a variety of modeling scenarios with documentation and error checking. Data objects in gcamdata are required to have descriptive metadata attached, which allows researchers to track data provenance throughout the system. As a result, a full, system-wide data map can be constructed with particular data dependencies, upstream and/or downstream, traced through the system. Any object and its dependencies in the system can be explored in detail as all data objects flowing between the various parts of the system include extensive metadata (including title, units, source, and comments). Many parts of the gcamdata package can be repurposed for any data system that involves multiple, potentially interacting, data processing steps, improving the reproducibility and transparency of science in many modeling domains.
Contacts (BER PM)
U.S. Department of Energy Office of Science, Office of Biological and Environmental Research
Climate and Environmental Sciences Division (SC-23.1)
Pacific Northwest National Laboratory
Primary support for this work was provided by the U.S. Department of Energy, Office of Science, as part of research in the Multisector Dynamics, Earth and Environmental System Modeling Program. Additional support was provided by the U.S. Department of Energy Offices of Fossil Energy, Nuclear Energy, and Energy Efficiency and Renewable Energy and the U.S. Environmental Protection Agency.
Bond-Lamberty, B., K. Dorheim, R. Cui, R. Horowitz, A. Snyder, K. Calvin, L. Feng, R. Hoesly, J. Horing, G. P. Kyle, R. Link, P. Patel, C. Roney, A. Staniszewsi, S. Turner, M. Chen, F. Feijoo, C. Hartin, M. Hejazi, G. Iyer, S. Kim, Y. Liu, C. Lynch, H. McJeon, S. Smith, S. Waldhoff, M. Wise, and L. Clarke. “gcamdata: An R package for preparation, synthesis, and tracking of input data for the GCAM integrated human-earth systems model.” Journal of Open Research Software 7(6) (2019). [DOI:[DOI: 10.5334/jors.232]
SC-33.1 Earth and Environmental Sciences Division, BER
BER supports basic research and scientific user facilities to advance DOE missions in energy and environment. More about BER
Mar 23, 2021
Molecular Connections from Plants to Fungi to Ants
Lipids transfer energy and serve as an inter-kingdom communication tool in leaf-cutter ants&rsqu [more...]
Mar 19, 2021
Microbes Use Ancient Metabolism to Cycle Phosphorus
Microbial cycling of phosphorus through reduction-oxidation reactions is older and more widespre [more...]
Feb 22, 2021
Warming Soil Means Stronger Microbe Networks
Soil warming leads to more complex, larger, and more connected networks of microbes in those soi [more...]
Jan 27, 2021
Labeling the Thale Cress Metabolites
New data pipeline identifies metabolites following heavy isotope labeling.
Aug 31, 2020
Novel Bacterial Clade Reveals Origin of Form I Rubisco
List all highlights (possible long download time)