The 2010 DOE Systems Biology Knowledgebase Implementation Plan describes scientific objectives the Knowledgebase will support in microbial, plant, and metacommunity research areas. An example workflow for microbial scientific objective 2, Define Microbial Gene Expression Regulatory Networks, is illustrated in this figure. Once RNA-Seq data (short sequences) are collected from a particular growth state for a specific species (step 1), they will be mapped back to their associated genome sequence (step 2) and the reads/bp (reads per base pair) will be calculated as a measure of each gene’s or operon’s expression level (step 3). The reads/bp will be displayed in conjunction with the genome sequence (step 4) using the latest version of Artemis, which already has this capability. Rules will be generated to define operons (step 5) based solely on these data. The output of this analysis will be a list of operons and their expression level for each growth state of every species analyzed. Using OrthoMCL to help define orthologous genes (8), orthologous operons will be identified in related genomes (step 9) and used to identify as many orthologous promoters as possible (step 10). Next, the transcription factor binding sites (TFBS) for these promoters will be predicted using two separate techniques. One will involve multiple sequence alignment of the orthologous promoters in an attempt to define the TFBS (step 11) based on their conservation. This technique depends upon the number of sequenced, related genomes and the total genetic distance between all the organisms in each alignment. The average nucleotide identity (ANI) thus will be used to estimate if there will be sufficient sequence divergence in an alignment. If the orthologous operons can be identified in more distant relatives, attempts will be made to expand the alignments. The second technique will use more traditional TFBS prediction algorithms (12) such as those described in Liu et al. 2008 and Conlan et al. 2005. Results from both techniques will be compared for consistency. Next, cluster analysis will be performed on the differences in gene (operon) expression identified in the RNA-Seq data (step 6). Finally, small regulatory RNAs will be identified from the frequency plot (step 7), as previously described (Passalacqua et al. 2009; Yoder-Himes et al. 2009). In this workflow, white boxes represent current capabilities and procedures. Green boxes are procedures that have not been developed but are expected to be fairly easy to construct (year 1). Red boxes are procedures that will be more difficult to construct (year 2). The blue box depicts a technique that is optional but would increase analysis accuracy. The purple box is the final product of the workflow (year 2).
Credit or Source: Office of Biological and Environmental Research of the U.S. Department of Energy Office of Science. science.energy.gov/ber
U.S. DOE. 2010. DOE Systems Biology Knowledgebase Implementation Plan. U.S. Department of Energy Office of Science (p. 25) ( website)