Towards a unified theory of cancer risk

Martin Nowak and Bartlomiej Waclaw conclude their recent commentary [1] on the “bad luck and cancer” debate with a look to the future:

“The earlier analysis by Tomasetti and Vogelstein has already stimulated much discussion… It will take many years to answer in detail the interesting and exciting questions that have been raised.”

I agree. When a couple of journalists [2, 3] contacted me for comments on the latest follow-up paper from Christian Tomasetti, Bert Vogelstein and Lu Li, I emphasized what can be gained from rekindling the decades-old debate about the contribution of extrinsic (or environmental, or preventable) factors to cancer risk. In particular, the diverse scientific critiques of Tomasetti and Vogelstein’s analysis suggest important avenues for further inquiry.

My own take is summarized in the figure below. This diagram (inspired by Tinnbergen) reframes the question in terms of proximate mechanisms and ultimate causes. It also provides a way of categorizing cancer etiology research.

Causes of cancer

Tomasetti and Vogelstein’s 2015 paper [4] demonstrated that the lifetime number of stem cell divisions is correlated with cancer risk across human tissues (part A in the figure). Colleagues and I have argued [5, 6] that, although characterizing this association is important, it cannot be used to infer what proportion of cancer risk is due to intrinsic versus extrinsic factors. This is because cancer initiation depends not only on mutated cells, but also on the fitness landscape that governs their fate, which is determined by a microenvironment that differs between tissues (figure part B).

Moreover, the supply of mutated cells and the microenvironment are both shaped by an interaction of nature and nurture (figure part C). In a recently published paper [7], Michael Hochberg and I draw attention to the relationship between cancer incidence and environmental changes that alter organism body size and/or life span, disrupt processes within the organism, or affect the germline (figure part D). We posit that “most modern-day cancer in animals – and humans in particular – are due to environments deviating from central tendencies of distributions that have prevailed during cancer resistance evolution”. We support this claim in our paper with a literature survey of cancer across the tree of life, and with an estimate of cancer incidence in ancient humans based on mathematical modelling [7].

To understand why cancer persists at a certain baseline level even in stable environments, we must further examine the role of organismal evolution (figure part E). If cancer lowers organismal fitness then we might expect selection for traits that reduce risk. But continual improvement in cancer prevention is expected to come at a cost, and the net effect on fitness will depend on life history. For example, more stringent control of cell proliferation might reduce cancer risk and so lower the mortality rate at older ages, while also increasing deaths in juveniles and young adults due to impaired wound healing. We can predict outcomes of such trade-offs by calculating selection gradients, which is what I’ve been doing in a research project that I presented at an especially stimulating MBE conference in the UK last week.

The quest to understand cancer risk must then encompass not only cell biology, but also ecology and evolution at both tissue and organismal levels. One of my goals is to make connections between these currently disparate lines of research in pursuit of a more unified theory.


  1. Nowak, M. A., & Waclaw, B. (2017). Genes, environment, and “bad luck”. Science, 355(6331), 1266–1267.
  2. Ledford, H. (2017) DNA typos to blame for most cancer mutationsNature News.
  3. Chivers, T. (2017) Here’s Why The “Cancer Is Caused By Bad Luck” Study Isn’t All It Seems. Buzzfeed.
  4. Tomasetti, C., & Vogelstein, B. (2015). Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science, 346(6217), 78–81.
  5. Noble, R., Kaltz, O., & Hochberg, M. E. (2015). Peto’s paradox and human cancers. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1673), 20150104–20150104.
  6. Noble, R., Kaltz, O., Nunney, L., & Hochberg, M. E. (2016). Overestimating the Role of Environment in Cancers. Cancer Prevention Research, 9(10), 773–776.
  7. Hochberg, M. E., & Noble, R. J. (2017). A framework for how environment contributes to cancer risk. Ecology Letters20(2), 117–134.

The Box-Einstein surface of mathematical models

As a mathematical modeller in evolutionary biology, my seminar bingo card has four prime boxes. Watching a talk about evolution, I count down the minutes to the first appearance of Dobzhansky’s “nothing in biology” quote (or some variant thereof) or a picture of Darwin’s “I think” sketch. For mathematical modelling, it’ll be either Albert Einstein or George Box:

“All models are wrong but some are useful” – George Box

“Everything should be made as simple as possible, but not simpler” – probably not Albert Einstein

Of course, such quotes are popular for good reason, and I’m not criticising those who use them to good effect, but all the same it can be fun to try to find a new way of presenting familiar material. That’s why in spring 2015 I came up with and tweeted a visual summary of the latter two aphorisms, which I named the Box-Einstein surface of mathematical models:


The grey region in the plot ensures that all possible models have some degree of “wrongness”, but the contours in the remaining region tell us that some models are useful all the same. To find the most useful description of a particular phenomenon, we must reduce complexity without overly increasing wrongness.

A key thing to understand about this diagram is that although the boundary of the grey region is invariant, the surface is changeable. If our empirical knowledge of the system becomes richer, or if we change the scope of our enquiry, the most useful model may be more or less complex than before.

Einstein’s quote can be seen as simply paraphrasing Occam’s razor, but I think it has additional meaning with regard to what Artem Kaznatcheev calls heuristic and abstract mathematical models, such as are generally used in biology. In statistics, a simple model has few degrees of freedom, which is desirable to reduce overfitting. However, statisticians should also beware what JP Simmons and colleagues termed “researcher degrees of freedom”:

“In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?

“It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding at the 5% level is necessarily greater than 5%.”

Likewise, when a researcher makes a mathematical model of a dynamical system – be it a set of differential equations or a stochastic agent-based model – he or she makes numerous decisions, usually with more or less full knowledge of the empirical data against which the model will be judged.

But there’s an important difference between the process of collecting data and that of creating a mathematical model. Ideally, the experimentalist can minimise researcher degrees of freedom by following a suitable experimental design and running controls that enable him or her to test a hypothesis against a null according to a predetermined statistical model. For most mathematical models there is no such template, and a process of trial and improvement is unavoidable, forgivable, and even desirable (inasmuch as it strengthens understanding of why the model works). The role of mathematical modeller is somewhere between experimentalist and pure mathematician. By making our models as simple as possible, we shift ourselves further toward the latter role, and our experimentation becomes less about exploiting our freedom and more about honing our argument.

For further reading, check out Artem Kaznatcheev’s insightful post about what “wrong” might mean, and why Box’s quote doesn’t necessarily apply to all types of model.