Factors in Designing, Developing, and Using the DOE Systems Biology Knowledgebase

The Genomic Science program (formerly GTL) Enterprise is the coordinated operation of the DOE Genomic Science program and the enabling knowledgebase. Two major functions of the science programs are to provide requirements for the DOE Systems Biology Knowledgebase creation, maintenance, and operation and to establish the needed data and information that the Knowledgebase would commensurately supply. These science programs also provide the research community with the resources to use and contribute to the Knowledgebase. Furthermore, the programs would supply data and information inputs to the Knowledgebase and perform analyses resulting in the output of knowledge sought by program researchers. Information from other databases also would be incorporated into the knowledgebase as needed. The science programs emphasize systems biology approaches to fundamental scientific challenges in bioenergy, carbon cycling, and contaminant fate and transport. These programs also pursue a variety of other research objectives described in this report and produce diverse data, including those resulting from genomic analyses and accompanying global "omic" information. Also produced are various types of imaging data; information on the spatial and temporal scales of systems studied; results from modeling experiments; measurements of physiology, function, and the environment; and provenance data for documenting the results of analyses. Analyses conducted by Genomic Science programs include those that are comparative as well as queries and simulation experiments. Design features and requirements envisioned for the DOE Knowledgebase (see figure) involve system architecture; provision for heterogeneous data and metadata; data-integration capacity; intuitive user elements; various assets such as computational hardware in multiple locations; tools; quality control/quality assurance (QC/QA) capabilities; communication among data providers, integrators, and users; and other Knowledgebase services. The resultant Knowledgebase and its infrastructure would be a cooperative endeavor between the biological research community and computational and information scientists who would establish physical Knowledgebase assets, required tools, data repositories, appropriate communications capabilities, services, expert personnel, appropriate resources for users, and standards and practices for data providers and users. Knowledgebase developers will create a governance model outlining oversight; operational requirements; and the roles, responsibilities, authorities, and accountabilities (R2A2) for users and those maintaining and operating the Knowledgebase. Accompanying these components of knowledgebase management (e.g., standards and processes, QC/QA protocols, program staff, and resources and funding), the Genomic Science program will provide Knowledgebase operational requirements, oversight, and resources for research programs and will define the R2A2 of the DOE Systems Biology Knowledgebase community.

U.S. DOE. 2009. U.S. Department of Energy Office of Science Systems Biology Knowledgebase for a New Era in Biology: A Genomics:GTL Report from the May 2008 Workshop, DOE/SC-113, U.S. Department of Energy Office of Science. (p. 14) (website)

