U.S. Department of Energy Office of Biological and Environmental Research

BER Research Highlights


Improving the Reliability of Metagenomic Sequencing Data
Published: June 07, 2012
Posted: August 21, 2012

Natural microbial communities usually are made up of a large variety of species. Knowing the community's composition is important for addressing DOE energy and environmental missions. Sequencing of the community's combined genome (the ‘metagenome') is now the best way to characterize these communities, but to make sense of the data, it is important to accurately account for all of the experimental and instrumental errors in the process. Up to now, the instrumental errors have been routinely estimated, but not the sample collection and preparation errors. As part of the DOE Systems Biology Knowledgebase project, researchers at Argonne National Laboratory have developed an open-source program called DRISEE (duplicate read inferred sequencing error estimation) to account for both types of errors. DRISEE identifies errors that could be due to sample collection, intermediary DNA processing techniques, or to the instruments themselves. Using DRISEE, the authors reproduce known error rates from a given set of standard data. They then apply this method to show that many factors can contribute to errors in sequencing including read length and sample preparation. Although this method so far only applies to 454 and Illumina sequencing, it will provide valuable assistance to scientists trying to assemble genomes from metagenomic data by helping them determine if the sequence data has a true error and should be disregarded or if it is a natural sequence variation and should be included.

Reference: Keegan, K. P., W. L. Trimble, J. Wilkening, A. Wilke, T. Harrison, M. D'Souze, and F. Meyer. 2012. "A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISSE," PLoS Computational Biology 8(6), e1002541. DOI: 10.1371/journal.pcbi.1002451. (Reference link)

Contact: Susan Gregurick, SC-23.2, (301) 903-7672
Topic Areas:

  • Research Area: Genomic Analysis and Systems Biology
  • Research Area: Microbes and Communities
  • Research Area: Sustainable Biofuels and Bioproducts
  • Research Area: Computational Biology, Bioinformatics, Modeling
  • Cross-Cutting: Scientific Computing and SciDAC

Division: SC-23.2 Biological Systems Science Division, BER

 

BER supports basic research and scientific user facilities to advance DOE missions in energy and environment. More about BER

Recent Highlights

May 10, 2019
Quantifying Decision Uncertainty in Water Management via a Coupled Agent-Based Model
Considering risk perception can improve the representation of human decision-making processes in age [more...]

May 09, 2019
Projecting Global Urban Area Growth Through 2100 Based on Historical Time Series Data and Future Scenarios
Study provides country-specific urban area growth models and the first dataset on country-level urba [more...]

May 05, 2019
Calibrating Building Energy Demand Models to Refine Long-Term Energy Planning
A new, flexible calibration approach improved model accuracy in capturing year-to-year changes in bu [more...]

May 03, 2019
Calibration and Uncertainty Analysis of Demeter for Better Downscaling of Global Land Use and Land Cover Projections
Researchers improved the Demeter model’s performance by calibrating key parameters and establi [more...]

Apr 22, 2019
Representation of U.S. Warm Temperature Extremes in Global Climate Model Ensembles
Representation of warm temperature events varies considerably among global climate models, which has [more...]

List all highlights (possible long download time)