A History of Directed Evolution

From Sol Spiegelman to Frances Arnold to David Liu, this Nobel Prize-winning technology has revolutionized chemical biology.

This is my original draft of this essay, which I have uploaded onto this blog for those interested in seeing the editorial process. As of June 2024, the completed, polished version can now be found in  American Scientist  ( PDF ).

Introduction

It is amazing to think that one discovery could have such a monumental impact, but Charles Darwin's theory of evolution did just that. The idea that species evolve and adapt to their surroundings over multiple generations has had a profound effect on the way we understand the world around us.

In principle, the process of evolution works in three cyclic steps: diversification, selection, and amplification (Darwin, 1859) (Fig. 1). Diversification introduces genetic variation within a population, most commonly through genetic mutation, gene flow, genetic drift, or changes in environmental conditions. Diversification results in organisms with new genetic variants whose corresponding phenotypes may be beneficial for survival in the environment, such as an increased ability to acquire nutrients or a better resistance against high temperatures (Urry et al., 2017). During selection, those variants that do provide an advantage to its organism’s survival and reproduction (positive selection pressure) are favored over those that do not (negative selection pressure). During amplification, the surviving organisms reproduce and pass on their traits to their progeny — thus, the new beneficial variants are spread throughout the population, further increasing the likelihood of their success in the future. Then this cycle repeats, and through millions of successive generations or more, evolution allows species to become more specialized and better adapted to their environment (Urry et al., 2017).

Figure 1. A concept map of the three core steps of Darwinian evolution: diversification, selection, and amplification.

Though Darwin primarily described these processes in the context of nature, he realized that the second step in particular, selection, could be artificially manipulated by humans. In fact, in the first chapter of Darwin's 1859 magnum opus On the Origin of Species, entitled “Variation under Domestication,” Darwin describes how humans can consciously select individual organisms to breed so that certain traits are passed down to offspring

(Darwin, 1859). That is, as opposed to natural selection, by performing selection manually and controlling which organisms bred and produced offspring, humans could artificially control which genetic variations were passed down. This realization was a major contribution to the development of modern animal and plant breeding.

Theory and Initial Efforts

Humans have selectively bred animals and plant crops for hundreds of years; in fact, it was first established as a scientific practice during the British Agricultural Revolution in the 1700s (BBC - History - Robert Bakewell, n.d.), and historical records even show instances of selective breeding of wheat crops in ancient Mesoamerica almost 2000 years ago (Plumer, 2014). However, while selective breeding in plants and animals has been a longstanding practice, the idea of using these same Darwinian evolution principles to develop better biomolecules in the laboratory — such as RNA and proteins — did not arise until much more recently.

In fact, it was not until 1962 when Alexander Rich, professor of biophysics at the Massachusetts Institute of Technology, first suggested that RNA might be able to undergo evolution and develop more desirable traits. While many scientists at the time considered RNA to merely play an informational role, Rich hypothesized that RNA could have other functions, such as driving chemical reactions or regulating cellular processes. As part of his larger “RNA World” hypothesis, Rich argued that because RNAs could have phenotypes of their own, then they could in principle be subject to the process of evolution (Gilbert, 1986; Rich, n.d.). If it was true that RNA is predisposed to the effects of natural selection and evolution, then it was not too much of a stretch for scientists to believe that they could then perform artificial evolution of RNA in the laboratory to create RNAs with specific functions. Moreover, because it was widely accepted in the 1960s that nucleic acids encoded proteins (Crick, 1958), then it also was not a stretch to hypothesize that performing evolution on nucleic acids in vitro could also create improved versions of existing proteins.

However, while artificial selection of plants and animals focuses on the selection aspect of evolution, where humans choose and breed individuals with desirable traits without directly inducing genetic diversification, scientists sought to also take an active role in inducing diversification as well. This was largely due to the limited repertoire of naturally occurring biomolecules; unlike traditional breeding, where organisms possess extensive genetic diversity to facilitate selection, nucleic acid and protein molecules lacked the same inherent diversity. This was especially true given that nucleic acid and protein experiments were predominantly carried out in vitro, where the natural factors that typically drive diversification in living organisms do not occur. Consequently, scientists sought innovative strategies to manually induce diversification, in addition to artificial selection. Though the term was not coined until 1972 by J. C. Francis and P. E. Hansche, this was the birth of the field known as directed evolution (Arnold, 1998; Francis & Hansche, 1972).

The fundamental concepts underlying directed evolution were virtually identical to classical Darwinian evolution (Fig. 2). First, scientists needed a way to increase nucleic acid variants for gene diversification. Then, upon expression of these nucleic acid variants, scientists could then screen or select for the RNAs or proteins with their desired trait. Finally, scientists could then amplify these surviving informational macromolecules and diversify them again for another cycle. In theory, beneficial mutations accumulate over multiple generations, resulting in progressively improved phenotypes over time.

Figure 2. A hypothetical concept map of directed evolution of RNA and proteins in the laboratory: diversification, expression, selection, and amplification.

However, while the conceptual model made sense, directed evolution was not feasible in the mid-twentieth century due to limited technology. In 1964, S. A. Lerner and colleagues tried to use chemical mutagenesis to create different phenotypes in a strain of bacteria known as Aerobacter aerogenes, and they were successful, showing that there was merit to conducting in vitro evolution. However, the mutagens available at the time were not specific, so scientists could not accurately keep track of which mutations were being accumulated (Lerner et al., 1964). Techniques that exist today, such as targeted mutagenesis for diversification and high-throughput screening for rapid analysis of large libraries of biomolecules, were not available then, preventing directed evolution from becoming a reality.

The first application of directed evolution occurred in the 1960s in the twin cities of Champaign and Urbana in Illinois, situated amongst miles of corn and soybean fields, about 130 miles away from Chicago (Laura, 2022). There, a man by the name of Sol Spiegelman had joined the faculty of the University of Illinois. Interestingly, Spiegelman was not interested in studying directed evolution — he instead focused much of his research on bacterial viruses, or bacteriophages. However, during his undergraduate years, Spiegelman conducted research on bacterial mutations, during his PhD and postdoc years, he explored the ability of bacteria to adapt to new nutrients by creating new digestive enzymes (Sol Spiegelman Biographical Overview, 2019). Given his background in analyzing mutations and adaptations, it seemed almost fitting that Spiegelman would be the one to go on to make a landmark discovery in the field of directed evolution: creating the first artificially selected RNA molecule.

Spiegelman’s Monster

In 1965, Spiegelman and his team were studying a bacteriophage known as “Qβ” whose genetic material was entirely composed of RNA. From previous experiments, they knew that the RNA genome of Qβ encoded four proteins necessary to infect bacterial host cells, such as a “coat” protein to encapsulate and protect the virus as well as a “glue” protein to allow the virus to stick to the bacterial cell (Mills et al., 1967). What Spiegelman was most interested in, however, was not the contents of the Qβ viral genome or the specific proteins it encoded, but rather how the RNA genome replicated itself. Scientists knew that viral RNA genomes replicate to generate multiple copies of themselves, allowing the virus to produce even more viral proteins and propagate the infection within the host organism, but the mechanism of this process was largely unknown (Spiegelman et al., 1967).

Two years earlier, in 1963, Spiegelman had found the existence of an enzyme known as “Qβ replicase” that was able to successfully synthesize new RNA in vitro using the Qβ RNA genome as a template (Haruna et al., 1963). In this experiment, Spiegelman isolated the Qβ replicase enzyme and Qβ viral RNA genome in a test tube and attempted to see if these alone, alongside nucleotide building blocks, were enough to replicate the RNA genome. Indeed, the enzyme was able to synthesize a new RNA strand using the RNA genome as a template, even in the absence of any other parts of the virus or a bacterial host cell (Mills et al., 1967).

However, Spiegelman wished to verify whether this newly-synthesized RNA matched the sequence of the original viral RNA. He specifically needed to determine whether the newly-synthesized RNA could function as an initiating template for a subsequent replication cycle, which is a crucial step for viral RNA to successfully infect bacterial cells and propagate the infection in living systems. To do this, Spiegelman put together a long line of test tubes, each of which only contained Qβ replicase and RNA nucleotide building blocks, but not any Qβ RNA template. Then, in the first test tube, Spiegelman inserted a small amount of Qβ RNA and allowed RNA synthesis to occur. After approximately 15 minutes, Spiegelman then transferred some of the newly synthesized RNA from the first test tube into the second test tube, allowing RNA synthesis to occur there with a new template. Not only did Spiegelman find that the newly-synthesized RNA template could indeed act as a template for the subsequent replication cycle, but after repeated this process for a total of 74 test tubes, he found that the final RNA sequence was still fully capable of being replicated. Spiegelman had achieved what many believe to be the first synthesis of a biologically competent, infective viral nucleic acid (Mills et al., 1967).

However, this was not the only major discovery of this experiment. Curiously, Spiegelman found that the speed of replication in each subsequent test tube was greater compared to the previous tube. By performing a base composition analysis of the final purified RNA obtained from the 74th transfer, he found that “83 percent of the original genome had been eliminated,” indicating that the RNA was actually becoming shorter and shorter with each round of synthesis (Mills et al., 1967). This final shortened RNA sequence became famously known as “Spiegelman’s monster.”

Figure 3. (a) Concept map of Spiegelman’s in vitro evolution experiment of the Qβ phage RNA. (b) Spiegelman’s raw sedimentation analysis data showing how the quantity of RNA (as measured using radioactive 32P) increases per transfer, indicating more efficient RNA synthesis. This was discovered to be due to the Qβ RNA genome evolving to become shorter upon each generation.

It is important to realize that the main takeaway of this experiment was not just the Qβ RNA becoming smaller and simpler; that in itself is not an inherently interesting finding. The significance was that the Qβ RNA sequence had adapted to its new environment. In the wild, the role of the Qβ RNA genome is to synthesize proteins, such as lysis and coat proteins, in order for the virus to infect its bacterial host cell, so there is a selective advantage for Qβ to retain these genes in its genome. However, in its new environment in the test tube, there were no bacterial cells for Qβ to infect, so Qβ no longer had to synthesize these proteins; there was no selective pressure to retain these genes. Moreover, because there was only a limited amount of time between Spiegelman’s transfer of RNA between test tubes, that meant that RNA that was rapidly-replicating would be more likely to be transferred to the subsequent test tube than RNA that was slowly-replicating. That is, the new environment in the test tube was selecting for a different phenotype — how quickly the genome of Qβ could replicate — and the viral RNA was able to adapt specifically to these requirements. Spiegelman repeated these experiments several times and always converged upon the same shortened RNA sequence, providing evidence for convergent evolution in a test tube.

Spiegelman’s work was widely acclaimed, and several scientists attempted to conduct follow-up experiments (Chedd, 1972). Prominently, German scientists Günther Strunk and Tobias Ederhof at the Max Planck Institute for Biophysical Chemistry were curious if the RNA from Spiegelman’s experiment could evolve in other ways when exposed to different environment stress conditions. In their experiment, they repeated the protocol that Spiegelman performed but also added ribonuclease A (RNase A) to the test tubes, which cleaves at the 3’ side of all cytosine and uracil nucleotides of single-stranded RNA (Strunk & Ederhof, 1997). If the evolutionary hypothesis were true, then the RNA sequences over subsequent generations would be expected to contain fewer cytosine (C) and uracil (U) residues in order to develop resistance against RNase A cleavage.

Their data showed exactly that. After 80 test tube cycles, the RNA evolved from containing approximately 50% C and U residues in its 87 nucleotide-length strand to containing approximately 35% C and U residues in its 65 nucleotide-length (-) strand (Strunk & Ederhof, 1997). In addition to this, because RNase A could only cleave unpaired, single strand RNA, the RNA also evolved to develop intramolecular base pairing in the form of loops, preventing many of its C and U residues from being cleaved (Fig. 5). Ultimately, it was found that in the entire final RNA strand, only 2 nucleotides could be cleaved by RNase A, a major improvement (Strunk & Ederhof, 1997).

Figure 4. The final minus strand of Qβ RNA that was evolved to be resistant to RNase A, containing only 2 nucleotides susceptible to cleavage. Original figure from Günther Strunk and Tobias Ederhof (Strunk & Ederhof, 1997).

Similarly, researchers Craig Tuerk and Larry Gold from the University of Colorado, Boulder pioneered a breakthrough technique that allowed for the iterative evolution of nucleic acids, such as DNA or RNA, to bind to specific targets. This method became known as “systematic evolution of ligands by exponential enrichment”, or SELEX for short (Tuerk & Gold, 1990). Using Spiegelman’s experiments as a foundation, chains of RNA could be evolved to match desired traits.

While Spiegelman's experiment laid the groundwork for understanding in vitro evolution, it focused specifically on RNA replication using a viral RNA and did not involve the deliberate manipulation of genes or proteins. This limited scope made it challenging to extend the findings directly to other biological systems or to apply them in a broader context of directed evolution. Because of this, in addition to technological constraints, the scientific community did not immediately build directly upon Spiegelman's experiments. Researchers recognized the importance of his work in demonstrating the selective pressure and evolution of RNA molecules, but they needed to develop the necessary tools and methodologies to expand the concept of directed evolution to a wider range of biological systems.

Directed Evolution of Enzymes

With technological advancements in the decades following Spiegelman’s experiments, the interest in directed evolution began to re-emerge. In 1984, citing the experiments of Spiegelman, scientists Manfred Eigen and William Gardiner at the Max Planck Institute for Biophysical Chemistry proposed a theoretical model for performing directed enzyme evolution (Eigen & Gardiner, 1984). The proposal was simple: one could perform a series of diversification, expression, selection, and amplification steps on proteins instead of RNA, following the principles of Darwinian evolution (Fig. 6). Specifically, one could evolve nucleic acids such as RNA, but instead of keeping the RNA and analyzing its phenotype, one could use the RNA as a coding template to create better and better proteins.

Figure 5. Original copy of Eigen and Gardiner’s model for the directed evolution of enzymes (Eigen & Gardiner, 1984).

Importantly, one must start with an enzyme that is already close to having the properties of the desired enzyme. For instance, if one wants to use directed evolution to engineer a polymerase that can function at high temperatures, the enzyme must at least be able to act as a polymerase at normal temperatures. This is because it is impossible to test every single possible set of mutations of a protein, known as the protein’s “mutational space.” In fact, a peptide that is just 8 amino acids long (much shorter than the average protein sequence length of 400 amino acids (Brocchieri & Karlin, 2005)) can have over 25 billion possible amino acid sequence combinations. In other words, it is impossible to achieve comprehensive coverage of a protein’s “sequence space,” a term coined by British evolutionary biologist John Maynard Smith (Maynard Smith, 1970).

As a result, the goal of directed evolution is to start with an enzyme that is already close to the desired property and to take mutation steps towards an improved phenotype. As American geneticist Sewall Wright first described in 1932, one can plot the possible sequence space of a protein on a flat 2D plane, and the corresponding phenotypic fitness of that sequence, or how close it is to the desired trait, can be plotted on the z-axis, with a higher level indicating a more desirable phenotype (S. Wright, 1932). Using this concept, the goal of directed evolution is to climb towards the peak through the accumulation of beneficial mutations over subsequent generations.

Figure 6. (a) Schematic of an evolutionary fitness landscape, which visualizes the relationship between genotypes (xy plane) and reproductive success or the closeness of a phenotype to a desired trait (z axis) (Packer & Liu, 2015). (b) Geneticist Sewall Wright’s original depiction of a fitness landscape, similar to the left plot but depicted as a contour map (top-down view), with (+) indicating desirable phenotypes and (-) indicating undesirable phenotypes (S. Wright, 1932).

It was not until 1993, almost 10 years after Eigen and Gardiner’s theoretical model, that somebody put this model of directed evolution of enzymes into direct practice. This occurred in Pasadena, California, at the California Institute of Technology, where then-associate professor Frances Arnold was curious about this technology. Frances Arnold grew up in the small town of Pittsburgh, Pennsylvania. From an early age, Arnold spent much of her time in the workshop of her father, nuclear physicist William Howard Arnold, and grew up loving to solve problems and technology (Hoban, 2019). These passions led her to study mechanical and aerospace engineering at Princeton University, and she subsequently underwent graduate studies in chemical engineering and biochemistry at the University of California, Berkeley under the tutelage of Professor Harvey Warren Blanch (The Nobel Prize in Chemistry 2018, n.d.).

It was not until Arnold arrived at Caltech, however, that she first glimpsed the potential of directed evolution, an idea that would go on to define her career. Arnold wished to use Eigen and Gardiner’s theoretical model of directed evolution to obtain a serine protease that could maintain high enzymatic activity, even in a highly denaturing environment (Chen & Arnold, 1993). She specifically chose to study the Subtilisin E serine protease, as it was readily available, well-characterized, and had diverse substrate specificity. Arnold knew that the wild-type form of the serine protease experienced a sharp decrease in catalytic efficiency upon exposure to high concentrations of dimethylformamide (MDF), so she was curious whether she could harness the principles of directed evolution to obtain a more DMF-tolerant variant of the enzyme (Chen & Arnold, 1993).

To achieve this, Arnold first selected a starting enzyme that was close to being able to perform the function she was looking for (cleaving the peptide sAAPF-pna in a denaturing environment). Next, she constructed a DNA sequence library encoding this enzyme, and randomly mutated the genes in the library to create a pool of variants using error-prone PCR. Then, each of the enzyme variants were screened for their desired property — in this case, how well they maintained their catalytic efficiency in a denaturing environment. Afterward, the variants that showed promise were selected and further optimized through additional rounds of mutation and selection. Through a series of directed evolution cycles, Arnold ultimately obtained an evolved variant (named “PC3”) that exhibited 256 times higher relative efficiency under the same denaturing environment (Fig. 7) (Chen & Arnold, 1993). As the first scientist to put Eigen and Gardiner’s model of directed evolution into practice, Frances Arnold became widely recognized for her work and eventually won the Nobel Prize in Chemistry in 2018.

Figure 7. Catalytic efficiency of wild-type (—) versus evolved (--) forms of the serine protease subtilisin E over increasing concentrations of dimethylformamide (DMF), which creates a denaturing environment (Chen & Arnold, 1993). This was measured by how quickly the enzyme could cleave the substrate peptide succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (sAAPF-pna).

Ever since this experiment, Arnold and other scientists have worked to further optimize each step of directed evolution. For instance, while error-prone PCR is one way to introduce mutations to a protein, these mutations tend to affect only single nucleotides or short regions of the DNA sequence. To generate an even more diverse pool of variants, scientists also began to harness the natural process of homologous recombination, a DNA repair system that involves the exchange of genetic material between two similar DNA sequences. By introducing designed DNA fragments with specific variations into the genome alongside the target gene, researchers can facilitate homologous recombination events, leading to the generation of chimeric DNA sequences. Notably, Willem P. C. Stemmer of the Affymax Research Institute in Palo Alto, California developed a technique called DNA shuffling, which enables in vitro homologous recombination. In DNA shuffling, the DNA of interest is first digested into small fragments using an enzyme called DNase I. Then, through a primer-free PCR-like assembly step, these fragments are randomly reassembled and ligated together. This process generates new chimeric DNA sequences with different combinations of genetic information. DNA shuffling allows for the mixing and recombination of genetic material from related genes or sequences, resulting in diverse hybrid variants that may possess improved characteristics compared to the original sequences (Stemmer, 1994). The development of novel techniques, alongside advancements like machine learning for automated selection and new mutation introduction methods, has significantly enhanced the efficiency of directed evolution. Better genetic recombination, screening, and selection methods have led to more effective strategies in the field of directed evolution (Fig. 8).

Figure 8. An outline of many of the possible techniques available for each of the steps of directed evolution today. (Steps in the Directed Evolution Cycle.Jpg, n.d.)

Continuous Evolution

While directed evolution was seen as a technological breakthrough, traditional directed evolution methods still had inherent limitations that hindered their effectiveness. Firstly, traditional directed evolution cycles are time-consuming and resource-intensive. Each cycle of diversification, expression, selection, and amplification can take several weeks, and each directed evolution experiment requires several evolution cycles (Yuan et al., 2005). The repetitive nature of this process can significantly slow down progress, impeding the ability to explore a broad range of mutations and limiting the capacity to optimize the evolution of complex biological systems. Secondly, traditional directed evolution cycles are discretized in that they are confined to distinct time intervals, necessitating periodic interruptions for screening and selection steps. This interruption in evolutionary pressure can disrupt the continuous improvement and evolution of biomolecules, potentially resulting in the loss of promising intermediate variants. This limitation poses a significant obstacle when working with slow-evolving systems or when searching for rare, highly desirable traits (Morrison et al., 2020). As a result, scientists were interested in combining the diversification, selection, and amplification processes into a continuous workflow, in order to maintain a steady evolutionary pressure on the organisms or biomolecules of interest.

The first documented attempts to create a continuous directed evolution system were conducted by Martin Wright and Gerald Joyce from the Scripps Research Institute in San Diego, California in 1997, in which they performed 300 rounds of continuous evolution of RNA ligase ribozymes in just 52 hours (M. C. Wright & Joyce, 1997) (Fig. 9). Their experiment took advantage of the ability of RNA molecules to catalyze their own joining process (ligation) and replicate themselves through amplification. By using a method of transferring the RNA molecules from one environment to another, they were able to maintain the evolving population of RNA ligase ribozymes. However, their methodology was specific to RNA ligase ribozymes and could not easily be applied to the directed evolution of other enzymes.

Figure 9. A schematic of the first in vitro continuous evolution experiment by Martin Wright and Gerald Joyce, harnessing coupled catalysis and amplification (M. C. Wright & Joyce, 1997).

Scientists were actually unable to develop an effective method for continuous evolution until 2011, when David R. Liu of Harvard University reported the first in vivo continuous evolution experiments through a method known as phage-assisted continuous evolution (PACE) (Esvelt et al., 2011). Liu attempted to harness the life cycle of a bacteriophage in order to perform directed evolution of a T7 RNA polymerase (RNAP) so that it could recognize a distinct promoter and initiate transcripts with ATP instead of GTP (Esvelt et al., 2011).

To do this, Liu used the principles of “phage display,” developed by George P. Smith in 1985 (Smith, 1985) and expanded upon by Sir Gregory P. Winter in 1990 (McCafferty et al., 1990), which involves the creation of a library of bacteriophages that each display a different protein of interest. The bacteriophages can then reproduce, with each new generation carrying the newly inserted gene in their genome, resulting in a large library of bacteriophages, each displaying a different protein of interest. These proteins are then subjected to selection steps that can be used to identify the proteins with the desired characteristics, typically done by exposing the library to an external stimulus such as a specific chemical compound or a cell surface marker, and the cycle repeats for additional rounds of amplification and binding (Fig. 10). Smith and Winter’s work on phage display led to the Nobel Prize in Chemistry in 2018 alongside Frances Arnold.

Figure 10. A brief schematic of phage display, which facilitates directed evolution by providing a physical coupling between phenotype and genotype. (Source: BioRender.com)

Following in their footsteps, Liu attempted to use similar technology to develop continuous protein evolution. He first obtained the protein coding sequence of RNAP and placed the gene into bacteriophage viruses. Then Liu allowed these bacteriophage viruses to inject the RNAP gene into E. coli host cells. He called this injected DNA the “selection phage.” These E. coli cells contained a “mutagenesis plasmid,” which, once activated, created many mutations in the inserted RNAP gene. This allowed for each E. coli cell to mutagenize the RNAP gene in a unique way, and some of these mutations were likely beneficial for RNAP function (Esvelt et al., 2011) (Fig. 11).

Figure 11. Schematic of the infection and mutation stages of the phage-assisted continuous evolution (PACE) system.

In order to then distinguish which phages contained helpful RNAP mutations, Liu limited which phages reproduced — that is, he only allowed the phages that carried the helpful RNAP mutations to reproduce. To do this, Liu took advantage of the fact that a bacteriophage needs a protein called pIII in order to leave an E. coli cell and infect another bacterial cell (Esvelt et al., 2011). Normally, the phage genome contains the gene that encodes pIII (gene III), which is then expressed by the E. coli cell to form pIII. However, in his PACE experiments, Liu removed gene III from the phage genome so that the phage would not be able to escape the E. coli cell by itself. Instead, he inserted an "accessory plasmid" into the E. coli host cell, which contained gene III and could only be expressed if that E. coli cell also developed helpful RNAP mutations. That way, only the phages with beneficial mutations could leave their E. coli cells, ready to infect more host cells and propagate (Fig. 12). This results in only the fittest gene variants surviving and reproducing, all without human intervention (Esvelt et al., 2011).

Figure 12. Schematic of the selection stage of the phage-assisted continuous evolution (PACE) system, done through the expression of gene III.

A question that may arise is: how did Liu distinguish between the E. coli host cells that contained the improved RNAP variants versus those that did not? To do this, Liu made use of a “lagoon” of bacterial cells, (Esvelt et al., 2011) where there is constant inflow and outflow of bacterial host cells and medium. When inoculating the lagoon with phages at the start, carrying the RNAP gene to be evolved, the phages infected E. coli cells and mutation occurs. Phages lacking beneficial mutations could not leave E. coli cells, and over time, those E. coli cells would be washed out due to the constant outflow and replaced by new cells. In contrast, phages with helpful mutations could reproduce and infect new cells before being washed out. Over time, only the bacteriophages containing variants with the most helpful mutations would remain in the lagoon (Esvelt et al., 2011) (Fig. 13).

Figure 13. Putting it all together: an overview of the phage-assisted continuous evolution (PACE) system (Esvelt et al., 2011).

Compared to single-pass directed evolution, PACE is much faster and more efficient, allowing for a much higher number of mutations to be introduced in a shorter period of time and resulting in a greater number of desirable mutants. And it has only improved in the past decade, allowing for even better optimization of a trait over a shorter period of time. Phage-based continuous evolution has been used to evolve polymerases (Carlson et al., 2014; Dickinson et al., 2013), proteases (Packer et al., 2017), Cas9 proteins (Hu et al., 2018), and more, including industrial enzymes, new antibiotics, and novel therapeutics.

Looking Forward

Throughout its history, directed evolution has played an important role in society, and it will continue to advance the field of molecular biology. Some may argue that modern gene editing technologies such as TALENs or CRISPR-Cas9 may make directed evolution obsolete, given that we are now able to make specific, targeted edits to virtually any nucleic sequence at the level of individual bases, thus allowing us to make any changes that we want.  However, though our genome editing technologies are indeed effective, we are still limited by the fact that we do not have a comprehensive understanding of how gene, protein, and function are related. While we may know what desired phenotype we want from an enzyme, we do not necessarily know what genetic mutations or base edits are required to achieve that phenotype. Because rational design approaches are not always possible, directed evolution serves as a powerful tool to produce better RNAs and proteins without necessarily having to understand which changes in the sequence are necessary for that result.

This small glimpse into the history of directed evolution serves as a reminder that countless minds have shaped its trajectory, with each scientist leaving an indelible mark on the canvas of discovery. The story of directed evolution is not just a tale of remarkable individuals, but a testament to the relentless pursuit of knowledge and the boundless potential of science and technology. By creating novel proteins or RNAs with specific functions, we can continue to create improved versions of existing proteins and therapeutic agents, as well as novel crop varieties or plants with improved traits. Directed evolution gives us the ability to make enzymes with novel activities, to make RNAs with enhanced functionality and specificity, to make DNA sequences with optimized properties, enabling us to engineer molecules and biological systems with unprecedented precision and efficiency. As we move towards the future, we can only imagine the wonders that await us, as new breakthroughs and innovations continue to propel us forward.

Harrison Ngue is an MD/PhD student at Harvard University. He graduated in 2023 with a degree in Biomedical Engineering and a minor in the History of Science. He is the founder of the animated educational YouTube channel "Powerhouse of the Cell" and regularly writes about science education, the history of science, and modern scientific research. You can follow Harrison on Twitter @harrison_ngue.

Figure 1. A concept map of the three core steps of Darwinian evolution: diversification, selection, and amplification.

Figure 2. A hypothetical concept map of directed evolution of RNA and proteins in the laboratory: diversification, expression, selection, and amplification.

Figure 3. (a) Concept map of Spiegelman’s in vitro evolution experiment of the Qβ phage RNA. (b) Spiegelman’s raw sedimentation analysis data showing how the quantity of RNA (as measured using radioactive 32P) increases per transfer, indicating more efficient RNA synthesis. This was discovered to be due to the Qβ RNA genome evolving to become shorter upon each generation.

Figure 4. The final minus strand of Qβ RNA that was evolved to be resistant to RNase A, containing only 2 nucleotides susceptible to cleavage. Original figure from Günther Strunk and Tobias Ederhof (Strunk & Ederhof, 1997).

Figure 5. Original copy of Eigen and Gardiner’s model for the directed evolution of enzymes (Eigen & Gardiner, 1984).

Figure 6. (a) Schematic of an evolutionary fitness landscape, which visualizes the relationship between genotypes (xy plane) and reproductive success or the closeness of a phenotype to a desired trait (z axis) (Packer & Liu, 2015). (b) Geneticist Sewall Wright’s original depiction of a fitness landscape, similar to the left plot but depicted as a contour map (top-down view), with (+) indicating desirable phenotypes and (-) indicating undesirable phenotypes (S. Wright, 1932).

Figure 7. Catalytic efficiency of wild-type (—) versus evolved (--) forms of the serine protease subtilisin E over increasing concentrations of dimethylformamide (DMF), which creates a denaturing environment (Chen & Arnold, 1993). This was measured by how quickly the enzyme could cleave the substrate peptide succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (sAAPF-pna).

Figure 8. An outline of many of the possible techniques available for each of the steps of directed evolution today. (Steps in the Directed Evolution Cycle.Jpg, n.d.)

Figure 9. A schematic of the first in vitro continuous evolution experiment by Martin Wright and Gerald Joyce, harnessing coupled catalysis and amplification (M. C. Wright & Joyce, 1997).

Figure 10. A brief schematic of phage display, which facilitates directed evolution by providing a physical coupling between phenotype and genotype. (Source: BioRender.com)

Figure 11. Schematic of the infection and mutation stages of the phage-assisted continuous evolution (PACE) system.

Figure 12. Schematic of the selection stage of the phage-assisted continuous evolution (PACE) system, done through the expression of gene III.

Figure 13. Putting it all together: an overview of the phage-assisted continuous evolution (PACE) system (Esvelt et al., 2011).