BER launches Environmental System Science Program. Visit our new website under construction!

U.S. Department of Energy Office of Biological and Environmental Research

BER Research Highlights

Breaking Through Computational Barriers to Create Designer Proteins
Published: December 01, 2018
Posted: March 16, 2020

Using advanced computing, scientists designed protein pairs that perfectly complement each other.

The Science
Designing proteins is a massive combinatorial problem. Scientists must consider how protein building blocks, amino acids, interact with each other in ways that drive their spatial position and orientation, resulting in three-dimensional protein structures. Then, they use a protein design algorithm to find proteins that perfectly pair with each other. This is particularly difficult when looking, among a database of [thousands to millions], for combinations of two different proteins that exclusively bind to one another. These protein pairs must have backbone shapes that only complement each other. Using advanced computational methods to find working designs, researchers created six protein pairs of this type in cells.

The Impact
If scientists could engineer pairs of proteins that bind only to one another, they could have much more control over cells in living systems. This ability could enable bioengineering applications with large impacts for medicine and biomaterials. Currently, scientists can only design DNA (not proteins themselves) to form these interactions. Being able to encode DNA gave rise to technologies such as DNA origami and artificial circuits. A general method for creating protein pairs would also be very powerful, opening the door to many more possibilities.

This work used the Rosetta software, which has a long history of being used for protein modeling, analysis, and design. Past helical bundle design work had focused on single-molecule bundles or on homooligomers (assemblies of many copies of the same molecule). With the pairing of two proteins, the coiled-coil parameter space is incredibly vast. Using the Rosetta software suite, the team used the Mira supercomputer at Argonne National Laboratory to sample conformations efficiently, through a massively parallelized grid search of 11 parameters, to find 87 million (20 million untwisted and 60 million left-handed supercoiled) unique working designs for four-helix backbones (35 residues each). The team then searched exhaustively for unique hydrogen-bond networks that connected all four helices, finding 2,251 unique networks. Low-energy sequences were then identified using the RosettaDesign server to test compatible placements of the hydrogen-bond networks within all four-helix candidates. Of the 97 computationally selected designs that were stable and satisfied additional criteria, 94 were well expressed in Escherichia coli, 85 had the expected size as measured with size-exclusion chromatography, 65 formed constitutive heterodimers, and 39 were exclusive heterodimers. Four designs that were selected to be validated against experimental data using X-ray crystallography were found to be in good agreement with the computational models, confirming the predicted hydrogen-bond networks that were designed into the structure. The team also investigated rearranging the hydrogen-bond networks in different helical repeat units to expand the heterodimer set. This rearrangement was largely successful, generating 22 new constitutive heterodimers. In the end, the team created six fully orthogonal protein heterodimer pairs in E. coli cells. This work provides a path forward for computationally designing specific, programmable binding into proteins, previously a property found only in the DNA and RNA world.

BER Program Manager
Amy Swain
U.S. Department of Energy Office of Science, Office of Biological and Environmental Research
Climate and Environmental Sciences Division (SC-23.1) and Biological Systems Science Division (SC-23.2)
Subsurface Biogeochemical Research and Biomolecular Characterization and Imaging Science

Principal Investigator
David Baker
University of Washington

The project received funding from the Office of Biological and Environmental Research, within the U.S. Department of Energy (DOE) Office of Science, and the National Institutes of Health (NIH) at the Advanced Light Source, a DOE Office of Science user facility at Argonne National Laboratory (ANL). The project used the Argonne Leadership Computing Facility, another DOE Office of Science user facility, to run the program.

Funding was also received from the Howard Hughes Medical Institute, Schmidt Futures Program, European Research Area Network (ERA-NET) BioOrigami Consortium, National Science Foundation, Burroughs Wellcome Fund Career Award at the Scientific Interface, German Research Foundation, Raymond and Beverly Sackler Fellowship, Institute for Protein Design, and Washington Research Foundation.

Chen, Z., et. al. “Programmable design of orthogonal protein heterodimers.” Nature 565, 106–11 (2019). [DOI:10.1038/s41586-018-0802-y].

Related Links
University of Washington news release: Scientists program proteins to pair exactly.

Topic Areas:

  • Research Area: Biosystems Design
  • Research Area: Computational Biology, Bioinformatics, Modeling
  • Research Area: Structural Biology, Biomolecular Characterization and Imaging
  • Research Area: Structural Biology Infrastructure
  • Cross-Cutting: Scientific Computing and SciDAC
  • Cross-Cutting: Light and Neutron User Facilities

Division: SC-33.2 Biological Systems Science Division, BER


BER supports basic research and scientific user facilities to advance DOE missions in energy and environment. More about BER

Recent Highlights

Mar 23, 2021
Molecular Connections from Plants to Fungi to Ants
Lipids transfer energy and serve as an inter-kingdom communication tool in leaf-cutter ants&rsqu [more...]

Mar 19, 2021
Microbes Use Ancient Metabolism to Cycle Phosphorus
Microbial cycling of phosphorus through reduction-oxidation reactions is older and more widespre [more...]

Feb 22, 2021
Warming Soil Means Stronger Microbe Networks
Soil warming leads to more complex, larger, and more connected networks of microbes in those soi [more...]

Jan 27, 2021
Labeling the Thale Cress Metabolites
New data pipeline identifies metabolites following heavy isotope labeling.

Analysis [more...]

Aug 31, 2020
Novel Bacterial Clade Reveals Origin of Form I Rubisco

  • All plant biomass is sourced from the carbon-fixing enzyme Rub [more...]

List all highlights (possible long download time)