The publicly accessible database promotes comparative analyses and ground-breaking discoveries through biological translation of sequence data.
A new database dedicated to global viral diversity has been developed by the Department of Energy Joint Genome Institute (DOE JGI). This database is the largest publicly available database for viruses, with 3,908 isolate reference DNA viruses and 264,413 computationally identified viral contigs from more than 6,000 ecologically diverse metagenomic samples. In a series of four articles recently published in Nucleic Acids Research, DOE JGI researchers also report on the latest updates to several publicly accessible databases and computational tools that benefit the global community of microbial researchers.
Microbes play key roles in maintaining the planet’s biogeochemical cycles. Viruses, thought to outnumber microbes by 10-fold, exert major influences on microbial survival and community interactions. Advances in sequencing technologies have generated vast amounts of data about these viruses, requiring tools to manage and interpret the information. Recent updates focus on database analytical tools for microbial genomics and viruses relevant to DOE missions in bioenergy and environment.
Providing high-quality, publicly accessible sequence data goes hand-in-hand with developing and maintaining the databases and tools that the research community can harness to help answer scientific questions. In a recent series of articles published in Nucleic Acids Research, researchers at DOE JGI, a national scientific user facility, describe a database called Integrated Microbial Genomes with Virus Samples (IMG/VR). IMG/VR is a comprehensive computational platform integrating all the sequences in the database with associated metadata and analytical tools. IMG/VR follows on the heels of a recent DOE JGI viral diversity study report in Nature. Additional articles in the same issue describe updates to several publicly accessible, interactive databases since the last set of reports published in 2014. For example, as of July 2016, there were 47,516 archaeal, bacterial, and eukaryotic genomes in the IMG with Microbiome Samples (IMG/M) system, with researchers noting that number “represents an over 300% increase since September 2013.” IMG/M contains annotated DNA and RNA sequence data of archaeal, bacterial, eukaryotic, and viral genomes from cultured organisms; single cell genomes (SCG) and genomes from metagenomes from uncultured archaea, bacteria, and viruses; and metagenomes from environmental, host-associated, and engineered microbiome samples. Another paper concerns the Genomes OnLine Database (GOLD), a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four-level classification system in the form of a study, organism (for isolates) or biosample (for environmental samples), sequencing project, and analysis project. A fourth paper focuses on the IMG Atlas of Biosynthetic gene Clusters (IMG-ABC). Launched in 2015, IMG-ABC enables researchers to search for biosynthetic gene clusters and secondary metabolites. Their latest update now incorporates ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across several thousand isolate microbial genomes, as well as a new search capability.
Daniel Drell, Ph.D.
Biological Systems Science Division
Office of Biological and Environmental Research
Office of Science, U.S. Department of Energy
Prokaryote Super Program Head
DOE Joint Genome Institute
U. S. Department of Energy, Office of Science, Office of Biological and Environmental Research
U.S. National Institutes of Health Data Analysis and Coordination Center
I.-M. A. Chen, et al., “IMG/M: Integrated genome and metagenome comparative data analysis system.” Nucleic Acids Research (2016). [DOI:10.1093/nar/gkw929] (Reference link)
S. Mukherjee, et al., “Genomes OnLine Database (GOLD) v.6: Data updates and feature enhancements.” Nucleic Acids Research (2016). [DOI: 10.1093/nar/gkw992] (Reference link)
D. Paez-Espino, et al., “IMG/VR: A database of cultured and uncultured DNA Viruses and retroviruses.” Nucleic Acids Research (2016). [DOI: 10.1093/nar/gkw1030] (Reference link)
M. Hadjithomas, et al., “IMG-ABC: New features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.” Nucleic Acids Research (2016). [DOI: 10.1093/nar/gkw1103] (Reference link)
JGI News Release: Unveiled: Earth's Viral Diversity
JGI Science Highlight: First Public Resource for Secondary Metabolites Searches
SC-23.2 Biological Systems Science Division, BER
BER supports basic research and scientific user facilities to advance DOE missions in energy and environment. More about BER
May 10, 2019
Quantifying Decision Uncertainty in Water Management via a Coupled Agent-Based Model
Considering risk perception can improve the representation of human decision-making processes in age [more...]
May 09, 2019
Projecting Global Urban Area Growth Through 2100 Based on Historical Time Series Data and Future Scenarios
Study provides country-specific urban area growth models and the first dataset on country-level urba [more...]
May 05, 2019
Calibrating Building Energy Demand Models to Refine Long-Term Energy Planning
A new, flexible calibration approach improved model accuracy in capturing year-to-year changes in bu [more...]
May 03, 2019
Calibration and Uncertainty Analysis of Demeter for Better Downscaling of Global Land Use and Land Cover Projections
Researchers improved the Demeter model’s performance by calibrating key parameters and establi [more...]
Apr 22, 2019
Representation of U.S. Warm Temperature Extremes in Global Climate Model Ensembles
Representation of warm temperature events varies considerably among global climate models, which has [more...]
List all highlights (possible long download time)