Science is a way of trying not to fool yourself. The first principle is that you must not fool yourself, and you are the easiest person to fool. – Richard Feynman

Epidemiology is the science of trying to find out what makes people healthier. Epidemiologists look at data to identify causal links between improved health and other factors. It is a correlational science which means that it can never really prove a causal link it can only suggest that a connection between two or more variables is unlikely to be caused by chance.

Correlation is a tricksy business. Perfect correlations tend not to exists so the relationships epidemiologist find are always, to a greater or lesser extend, imperfect. No matter how clear a signal we think we’ve identified, there is always noise.

+1.0 is a perfect positive correlation and -1.0 is a perfect negative correlation. If r= 0.5 what means there’s a medium correlation between to identified sets of variables which is a lot more than you’d expect to see by chance. Even though a medium strength correlations might appear plausible, that doesn’t mean that one is causing the other. Correlation, as the saying goes, is not causation.

Some correlations are obviously spurious, such as the finding that the rise global temperature has mirrored a decline in international piracy:

But others correlations can seem more plausible. Take the hyperbolic reports claiming the rise in autism diagnoses are caused by vaccinations. Yes, autism has increased over a similar time period that has seen vaccinations rise, but for this to represent a causal connection we’d not have to just see a correlation, they’d also need to be a testable hypothesis to link autism with organic food. Such a hypothesis has failed the test and any reports that the link is established are just junk science. Rises in autism correlate with pretty much everything that’s rise over the twenty years. As this article points out, autism also correlates with “the rise in chemtrail sightings, terrorist attacks on U.S. soil, the New England Patriot’s cumulative win total—and organic food sales”.

Even where epidemiologists find reasonable correlations between say, eating spinach and living longer, we still can’t be clear of the causal connection. It may be that spinach increases our lifespan but it could also be that the spinach is just a confounding factor. It’s well established that men live longer if they’re married and it could be that married men eat more spinach to please their wives. Maybe the spinach has nothing to do with it. Equally, maybe marriage has nothing to do with men’s life expectancy; if they’d just eat spinach as bachelors, maybe they’d be fine.[i]

Anyway, you can see the tangled skein produced when correlations are all we have to rely on. The correlational claims made by other sciences, such as behaviour genetics are more robust because they have mechanisms for analysing whether there are two-way mechanisms, and to establish if the cart is before the horse. Adoption studies are really useful for isolating the effects of environment and heritability: if there’s no correlation between the traits of adopted siblings we can be pretty sure differences will be due to heritable causes. Likewise, twin studies do a decent job of showing the reverse: if identical twin share a 100% of the genes then differences must be down to environmental factors.

So, what does all this have to do with education? Well, for us to take claims about ‘what works’ in educational seriously we have to establish whether education research is more like epidemiology or behaviour genetics. Does a piece of education research just show a causal relationship, or does make testable claims that we can use to see if the claim being made is false? Also, what is the causal relationship with? If research claims to be increasing something we’re not really sure how to reliably measure, like creativity, or something too vaguely defined to be meaningful like character, then we have a right to feel suspicious. At least with epidemiology there’s a very definite point of measurement: is the patient dead or alive after x years?

If we are to invest time and money in an educational intervention we ought to be reasonably sure the claims being made are robust. Here are some suggestions for questions to what about any piece of research purporting to demonstrate a causal connection:

  1. Is there a testable claim being made? If there’s nothing to test then there’s not much point in reading further. Just observing connections will only lead to confirmation bias.
  2. Under what circumstances could the claim be shown to be false? For a claim to qualify as scientific, especially in this context, it must be falsifiable.[ii]
  3. Is there a correlation between the effects of the intervention and a widely accepted measure of academic success? If the invention is claiming to affect children’s character, or any other aspect of personality, how is this being measured? If the intervention can’t be shown to improve student’s test scores in something, they we should steer clear. That is not to say that all of learning can be summed up by test scores, but if your claim doesn’t show up on some sort of metric somewhere, it’s probably wrong. As Lord Kelvin put it, “when you can measure what you know, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre unsatisfactory kind: it may be the begging of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”
  4. What strength is the correlation? We can claim a very small correlation is statistically significant, but we need to know that statistical significance can occur by chance. Especially is you keep repeating an experiment until it works or if you conduct factor analyses until you find something significant. (This is what’s known as p-hacking or data-dredging.)
  5. Is there a plausible explanation as to why the intervention might be causing the increase in academic success? Claims of far transfer tend to be dubious. If, as in the case of the EEF’s pilot study into the effects of Philosophy for Children there’s an implausible claim (in this case the idea that discussing ‘big questions’ about fairness or friendship can improve maths results) then the most likely interpretation is that any positive correlation will be a fluke.
  6. Does the claim support or contradict the findings of associated fields of research? One of the problems with some of the claims made by growth mindset researchers is that they contradict the scientific consensus in intelligence research. The claim that “the brain is like a muscle” is demonstrably false. If the brain were like a muscle then specific practice would produce global benefits. If you do exercises to strengthen leg muscles then you get better at running, jumping, and anything else which benefits from stronger legs. But, if you practise a particular form of mental exercise you don’t get better at anything other than that specific mental exercise. Brain training games only make you better at brain training games. The claims made by psychologists fulfil the first two criteria: they are both testable and they have not (yet) been proved wrong. As such, correlational studies which don’t concur with laboratory findings should be treated with caution.
  7. Does the intervention result in an effect size of above 0.4? I’ve written before of my scepticism about effect sizes, but Hattie’s point that everything has an effect is an important one. If pretty much everything teachers are likely to try will result in some sort of increase in test scores then claiming that your intervention ‘worked’ is small potatoes. I’ve often heard the maxim “Everything works somewhere but nothing works everywhere” trotted out as very weak support for teachers being allowed to do whatever they fancy regardless how slight the evidence base is. I find this maxim more useful: “Some things work in most circumstances, other things rarely work anywhere.”

I’m not a scientist and I’m certainly no expert in educational research. That said, I’ve become reasonably research literate over the past few years and this list has proved helpful in separating wheat from chaff. There are plenty of other questions worth asking (such as these) and just because a research doesn’t seem able to satisfy your curiosity doesn’t automatically make them wrong. This is just a rough guide for thinking critically about education research.

See also this post by Greg Ashman.

[i] I know I’ve reduced a complex science almost to absurdity – that is not my intent. I know epidemiology is a lot more sophisticated than I’m paining it. 

[ii] I’m well aware there’s a debate about whether falsifiability is a fair test of what’s scientific. If you want to argue the toss about string theory or the multiverse, this is not the place. Instead, if you want to debate the way I’m using falsifiability here, I’d ask you to come up with an example of an education claim that is a) unfalsifiable, and b) has some practical classroom application.