Skip to content
Human Health, Innovation

Can Machines Understand Complex Biology?

Visualization of strange attractors, which are fractal-like mathematical constructs taken from the field of chaos theory. Image via Creative Commons.

Emergent systems are hard for humans to grasp. But machines can recognize patterns in biology that elude us.

Moore’s Law observes that the number of transistors on an integrated circuit doubles every two years. Although the cadence has slowed, for 50 years engineers found new ways to cram ever more transistors onto each square millimeter of silicon wafer. But today’s 10nm features are approaching a fundamental limit. Electrodes will be so close together that quantum effects will sap the performance of transistors. Soon, an entirely new architecture based on quantum computing—with its own language, tools, principles, and technologies—will be necessary to support the pace of computing advances that we’ve come to expect.

In biology, we’ve been treated to decades of progressively deeper insights into the fundamental workings of life: They include genomics, stem cells, transcriptomics, the microbiome, immuno-oncology, and genome editing. With each breakthrough has come a new set of hypotheses, mechanisms, technologies, targets—and, eventually, therapeutics.

But what if we’re reaching a fundamental limit in biology, too—one not of size but of complexity? What if modern biology is too complex for our scientific framework, and new information—collected by layering technology upon technology in order to parse the workings of proteins, cells, and the body—does not provide new insights but only confounds? Then the data deluge, instead of helping us understand biology more deeply, is only invoking reductionism, in a vain attempt to make sense of the information.

Perhaps we also need a new architecture in biology, one that will unravel biology’s complexity with new principles and a new language. Such an approach in biology may come from the world of computing and the newly unlocked power of machine learning.

How we know what we know

Usually, understanding follows from knowledge. The more you know about a system and how its parts interact, the better you can predict how the system will behave.

Take the periodic table. Because we know the structures of electronic orbitals and the rules for their occupancy, we can accurately predict that carbon reacts like silicon and nitrogen reacts like phosphorus, and that fluorine, chlorine, and bromine will form salts with lithium, sodium, and potassium. Or consider Mendelian genetics. Because we know that heritable features are passed via genes from parent to offspring and follow the rules of segregation and independent assortment, we can accurately predict the three-to-one ratio of purple and white flowers from pea plant crosses and explain the prevalence of hemophilia in the Romanov family tree.

Much of our knowledge about how the world works comes from this type of reductive search for fundamental principles. The history of scientific progress—about everything from the motion of the planets to the molecules in cells—is a story of our increasingly detailed insights into the governing principles of nature.

In complex systems where emergent behaviors hinder human understanding, machines can capture governing principles and generate novel examples and insights.

Emergent behavior frustrates understanding

When systems become sufficiently complex, however, new behaviors emerge that can no longer be derived from the rules that govern the component parts. This concept is called emergence. It is a fundamental idea with an ancient origin—first described by Aristotle more than two millennia ago—about the properties of complex systems.

Aristotle identified “things which have several parts and in which the totality is not, as it were, a mere heap, but the whole is something besides the parts.” With such emergent systems, knowing more doesn’t mean that you understand more. For example, ant colonies swarm toward food, attack neighboring colonies, and abide by complex social structures—and none of that can be intuited by studying the behavior, biology, or physics of an individual ant. The modern conception of emergent phenomena in biology derives from the British philosopher John Stuart Mill, who observed in A System of Logic (1843) that although life emerged from nonliving matter, the existence of life could not be predicted from the properties of life’s “inorganic ingredients.” Since Mill, the role of emergence in biology has become clearer the closer we look.

Emergent behavior is ubiquitous in biology because biology is so complicated. Consider again the example of genetic diseases. While a simple Mendelian model explains strong monogenic phenotypes and chromosomal abnormalities such as hemophilia, sickle cell anemia, and Tay-Sachs disease, it cannot explain the incomplete penetrance and probabilistic associations of the genetic links to Alzheimer’s, Parkinson’s, or diabetes.

The picture gets murkier the more we learn about genes. Beyond genetic mutations, we now find alterations in the three-dimensional structure of the genome, changes in the rates at which genes are transcribed into proteins, post-translational modifications of the produced proteins, segregation of proteins into phase-separated gel-like structures, and many more peculiarities. Furthermore, the gene products in humans are themselves products of or influenced by the food we eat, the bacteria in our gut, how much we exercise, and more. The components must interact, but we don’t know how.

The reductionist search for first principles that has guided biology for centuries is a poor choice when the system exhibits emergent behavior. The deeper we probe the biology of disease the harder it becomes to understand how the system functions, and the harder it will be to develop treatments that reliably work.

Strange Attractor
"Attractor Poisson Saturne" by Nicolas Desprez via Creative Commons.

Emergence demands a new approach in biology

In the face of this irreducible complexity, however, modern biology has doubled down on reductionism. New tools let us generate rich multidimensional data sets at scales and resolution that outstrip our human capacity for interpretation, yet we insist on shoehorning the data into targets, pathways, and other conventional mechanistic frameworks.

For example, we now know that tumors consist of a heterogeneous population of cancer cells, immune cells, and fibroblasts that interact with one another dynamically. A cancer’s progression is clearly dependent on its microenvironment; but how do we identify the connections between the biology and disease? In one experiment performed regularly by academic labs, clinics, and biotechnology companies, researchers sequence the RNA from individual cells from a patient’s biopsied tumor to identify the specific gene expression patterns unique to each of the cell types.

On a good day, the effect is clear. A specific gene is turned on in a specific cell type only in patients who respond to a particular therapy. This gene may serve as a biomarker for segmenting patients in the clinic, or may suggest a new target for pharmacological inhibition.

All too frequently, however, once the data comes in, it’s too complex to understand. Multiple proteins and pathways are turned on in the responding patient, and they don’t make sense in any discernable way. Instead of assuming that these data are pointing to emergent laws of the tumor and its microenvironment, we often pick the most over- or under-represented proteins and pathways and spin a story about them that we can understand. If something fits with a dominant simple mechanistic hypothesis, it gets called out; if it doesn’t, it gets ignored.

How much true biology is lost in discarded data? How much information about how tumors grow or how the immune system responds would be uncovered if we approached the biology emergently, rather than reductively? What if we were open to the emergent biological laws hidden in our observations?

Enter the machines

One solution may be machine learning. We have ample evidence that machines can process complex information in ways the human brain cannot. In recent years, computers, DNA sequencers, and other tools have generated unique insights into difficult challenges that long eluded human understanding. For all the current limitations of artificial intelligence in doing many things humans find easy, modern algorithms are good at something we find nearly impossible: extracting the governing principles of complex systems.

Whereas humans used to teach machines the rules of grammar—this is a noun, this an adjective, here’s a verb phrase—now algorithms trawl examples of written text unsupervised and learn not just grammar but also the principles of meaning, and are thus able to generate completely new sentences and paragraphs. Artificial intelligence algorithms now beat humans at games like chess and Go. In these remarkably complex systems—chess has more combinations of moves than the number of stars in the universe, and Go has 120 orders of magnitude more!—the algorithms extract not only the relatively simple rules of the game without human instruction but also the infinitely more nuanced emergent strategies for successful gameplay.

Inherent in these examples is the idea that algorithms can be generative. Once the governing principles are learned, new examples of the systems can be created. Machines don’t care whether the logical rules are first principles understood by humans or are emergent laws that mysteriously appear with complexity: rules are rules, generation is generation.

In complex systems where emergent behaviors hinder human understanding, machines can capture governing principles and generate novel examples and insights.

When we contemplate biology with an emergent lens, it will look different—leading to new classes of drugs, new breeds of plants, and new models of transformative companies that marry machine learning and deep biological insights.

The future of biology

Yet a future where biological solutions are generated from the extracted principles underlying emergent behaviors won’t arrive by simply putting biologists and machine-learning experts in the same room.

There is an urgent need for bilingual scientists. Only those who speak the languages of both biology and computation will be able to frame the biological problems in machine-understandable grammar and improve the ways machines are learning from the data, enabling AI to generate the most predictive insights. The path from problem to data to machine to insight will be iterative, not linear, with multiple points that require human interpretation and testing. In the same way humans, augmented by machine learning programs, play better chess than either computers or humans playing alone.

For a machine to learn effectively, someone must choose the right algorithm to apply to a given problem, which depends entirely on the specifics of the problem. Biological systems rarely behave like the systems where machine-learning algorithms have typically been deployed. In chess or Go, the rules of the game are the same every time; in biology, the rules are highly dynamic. The data may be very noisy, in ways that are hard to understand. The same molecules that are anti-inflammatory in one context can be inflammatory in another. Cells that are wired one way in health are wired differently when they are diseased. The molecular components that drive unique cellular behaviors are even more complex. Machine-learning experts unversed in biology might guess that analyzing protein structures would be an image-recognition problem. But unlike most physical objects, proteins are highly dynamic, vastly modifiable, and not scale-invariant, which means conventional algorithms will often fail. In each case, deep expertise in the biology and computation is essential.

Similarly, biologists and data scientists need to work together to design and run experiments in ways that maximize a computer’s ability to extract the molecular drivers of emergent behaviors. This is not just a question of generating more data but rather of accepting complexity as a vital aspect of useful data. Scientists should continue to knock out all genes systematically with CRISPR, measure the levels of all mRNA transcripts in every cell, read the metabolomic and proteomic profiles of primary tissues, and generate cryo-electron micrographs of complex protein assemblies. But an emergent perspective does not view these data sets as fine-grained information, an opportunity to better separate the signal from the noise. Instead, the bilingual biologist will see all the data as positive examples of the emergent properties of the system and designs experiments to extract those patterns. As Miles Davis said about jazz, the silence is as important as the sound.

Furthermore, an emergent perspective will deprioritize lab-adapted experimental models in favor of primary, patient-derived, as-true-to-human-disease-state-as-possible systems. Instead of seeking models that can be automated and stripped down to probe single mechanistic hypotheses, the bilingual biologist must recognize that nature is the system from which the most relevant and useful governing principles can be extracted. It will do little good to learn the emergent behavior of a model cell line or the governing principles of a cancer-free mouse.

The successful therapeutics companies of the future won’t apply just one or two of these technologies to see more deeply into data. Rather, they will combine multiple lenses to capture the most nuanced view of the system possible, focus on complex human tissues in primary settings, and leverage machines to extract the fundamental rules that drive this complexity.

Uncovering complexity will provide new control

At Flagship Pioneering, we believe that biology and medicine will progress by using powerful platforms to uncover biological complexity and machine computation to extract the governing principles of emergent behaviors, leading to insights, predictions, and molecules to treat or even cure intractable diseases.

We believe that the proliferation of high-resolution tools across biological systems, explosion of machine-learning algorithms, democratization of computational power, redesigning of organizational structures, and breakdown of traditional domains of research will herald revolutions in multiple fields.

From single-cell multi-omics data, we can extract how cells choose their state, and create drugs to turn exocrine cells into insulin-producing islets in the pancreas of a diabetes patient. From high-resolution crystal structures, we can extract the universal rules for how proteins fold, act, and interact to generate new protein drugs to target undruggable disease pathways. From satellites high above the earth and biochemical sensors underground, we can extract the governing principles that predict crop growth under environmental stress and create novel seeds and varietals to feed the world’s growing population on a planet roiled by climate change. From blood metabolite levels, microbiome composition, and molecular profiling of nutrients, we can extract the governing principles of how the foods we eat regulate our holistic health.

There isn’t an area of biology that won’t be touched. When we contemplate biology with an emergent lens, it will look different—leading to new classes of drugs, new breeds of plants, and new models of transformative companies that marry machine learning and deep biological insights. Flagship is pioneering the development of these companies.

Story By

Jordi Mata-Fink

Jordi Mata-Fink joined Flagship Pioneering in 2013 after completing the firm's Fellows Program and left the firm in September 2019. At Flagship, Jordi worked as part of a Flagship Labs team, exploring innovative ideas and opportunities, and…

Nicholas Plugis

Nicholas Plugis joined Flagship Pioneering as an associate after completing the firm's Fellows Program. At Flagship, Nicholas conducts explorations to discover unexplored biological mechanisms and new biotechnologies. As part of a team of…

If you see an error in this story, contact us.

More from: Human Health, Innovation