Science is a way of trying not to fool yourself. The first principle is that you must not fool yourself, and you are the easiest person to fool. – Richard Feynman
Epidemiology is the science of trying to find out what makes people healthier. Epidemiologists look at data to identify causal links between improved health and other factors. It is a correlational science which means that it can never really prove a causal link it can only suggest that a connection between two or more variables is unlikely to be caused by chance.
Correlation is a tricksy business. Perfect correlations tend not to exists so the relationships epidemiologist find are always, to a greater or lesser extend, imperfect. No matter how clear a signal we think we’ve identified, there is always noise.
+1.0 is a perfect positive correlation and -1.0 is a perfect negative correlation. If r= 0.5 what means there’s a medium correlation between to identified sets of variables which is a lot more than you’d expect to see by chance. Even though a medium strength correlations might appear plausible, that doesn’t mean that one is causing the other. Correlation, as the saying goes, is not causation.
Some correlations are obviously spurious, such as the finding that the rise global temperature has mirrored a decline in international piracy:
But others correlations can seem more plausible. Take the hyperbolic reports claiming the rise in autism diagnoses are caused by vaccinations. Yes, autism has increased over a similar time period that has seen vaccinations rise, but for this to represent a causal connection we’d not have to just see a correlation, they’d also need to be a testable hypothesis to link autism with organic food. Such a hypothesis has failed the test and any reports that the link is established are just junk science. Rises in autism correlate with pretty much everything that’s rise over the twenty years. As this article points out, autism also correlates with “the rise in chemtrail sightings, terrorist attacks on U.S. soil, the New England Patriot’s cumulative win total—and organic food sales”.
Even where epidemiologists find reasonable correlations between say, eating spinach and living longer, we still can’t be clear of the causal connection. It may be that spinach increases our lifespan but it could also be that the spinach is just a confounding factor. It’s well established that men live longer if they’re married and it could be that married men eat more spinach to please their wives. Maybe the spinach has nothing to do with it. Equally, maybe marriage has nothing to do with men’s life expectancy; if they’d just eat spinach as bachelors, maybe they’d be fine.[i]
Anyway, you can see the tangled skein produced when correlations are all we have to rely on. The correlational claims made by other sciences, such as behaviour genetics are more robust because they have mechanisms for analysing whether there are two-way mechanisms, and to establish if the cart is before the horse. Adoption studies are really useful for isolating the effects of environment and heritability: if there’s no correlation between the traits of adopted siblings we can be pretty sure differences will be due to heritable causes. Likewise, twin studies do a decent job of showing the reverse: if identical twin share a 100% of the genes then differences must be down to environmental factors.
So, what does all this have to do with education? Well, for us to take claims about ‘what works’ in educational seriously we have to establish whether education research is more like epidemiology or behaviour genetics. Does a piece of education research just show a causal relationship, or does make testable claims that we can use to see if the claim being made is false? Also, what is the causal relationship with? If research claims to be increasing something we’re not really sure how to reliably measure, like creativity, or something too vaguely defined to be meaningful like character, then we have a right to feel suspicious. At least with epidemiology there’s a very definite point of measurement: is the patient dead or alive after x years?
If we are to invest time and money in an educational intervention we ought to be reasonably sure the claims being made are robust. Here are some suggestions for questions to what about any piece of research purporting to demonstrate a causal connection:
- Is there a testable claim being made? If there’s nothing to test then there’s not much point in reading further. Just observing connections will only lead to confirmation bias.
- Under what circumstances could the claim be shown to be false? For a claim to qualify as scientific, especially in this context, it must be falsifiable.[ii]
- Is there a correlation between the effects of the intervention and a widely accepted measure of academic success? If the invention is claiming to affect children’s character, or any other aspect of personality, how is this being measured? If the intervention can’t be shown to improve student’s test scores in something, they we should steer clear. That is not to say that all of learning can be summed up by test scores, but if your claim doesn’t show up on some sort of metric somewhere, it’s probably wrong. As Lord Kelvin put it, “when you can measure what you know, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre unsatisfactory kind: it may be the begging of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”
- What strength is the correlation? We can claim a very small correlation is statistically significant, but we need to know that statistical significance can occur by chance. Especially is you keep repeating an experiment until it works or if you conduct factor analyses until you find something significant. (This is what’s known as p-hacking or data-dredging.)
- Is there a plausible explanation as to why the intervention might be causing the increase in academic success? Claims of far transfer tend to be dubious. If, as in the case of the EEF’s pilot study into the effects of Philosophy for Children there’s an implausible claim (in this case the idea that discussing ‘big questions’ about fairness or friendship can improve maths results) then the most likely interpretation is that any positive correlation will be a fluke.
- Does the claim support or contradict the findings of associated fields of research? One of the problems with some of the claims made by growth mindset researchers is that they contradict the scientific consensus in intelligence research. The claim that “the brain is like a muscle” is demonstrably false. If the brain were like a muscle then specific practice would produce global benefits. If you do exercises to strengthen leg muscles then you get better at running, jumping, and anything else which benefits from stronger legs. But, if you practise a particular form of mental exercise you don’t get better at anything other than that specific mental exercise. Brain training games only make you better at brain training games. The claims made by psychologists fulfil the first two criteria: they are both testable and they have not (yet) been proved wrong. As such, correlational studies which don’t concur with laboratory findings should be treated with caution.
- Does the intervention result in an effect size of above 0.4? I’ve written before of my scepticism about effect sizes, but Hattie’s point that everything has an effect is an important one. If pretty much everything teachers are likely to try will result in some sort of increase in test scores then claiming that your intervention ‘worked’ is small potatoes. I’ve often heard the maxim “Everything works somewhere but nothing works everywhere” trotted out as very weak support for teachers being allowed to do whatever they fancy regardless how slight the evidence base is. I find this maxim more useful: “Some things work in most circumstances, other things rarely work anywhere.”
I’m not a scientist and I’m certainly no expert in educational research. That said, I’ve become reasonably research literate over the past few years and this list has proved helpful in separating wheat from chaff. There are plenty of other questions worth asking (such as these) and just because a research doesn’t seem able to satisfy your curiosity doesn’t automatically make them wrong. This is just a rough guide for thinking critically about education research.
See also this post by Greg Ashman.
[i] I know I’ve reduced a complex science almost to absurdity – that is not my intent. I know epidemiology is a lot more sophisticated than I’m paining it.
[ii] I’m well aware there’s a debate about whether falsifiability is a fair test of what’s scientific. If you want to argue the toss about string theory or the multiverse, this is not the place. Instead, if you want to debate the way I’m using falsifiability here, I’d ask you to come up with an example of an education claim that is a) unfalsifiable, and b) has some practical classroom application.
As always a fascinating post – there’s possibly another post where you could discuss the coefficient of determination r squared – which gives you the proportion of shared variance i.e a measure of the strength of the relationship between two variables
So if we have two variables (A – the independent variable and B the dependent variable) and we have a correlation coefficient + 0.5 then our coefficient of determination will be 0.25 – which means at best
A might cause B 25% of the time OR
B might cause A 25% of the time OR
C causes them both (and is not present all of the time) OR
it is a pure coincidence OR
it is some of the above in some unknown combination
In other words, there is some relationship explaining about a quarter of the overlap, but we don’t know anything about the cause.
In addition, when it comes Hattie and effect sizes – Bob Slavin’s post from last year is worth a look http://www.huffingtonpost.com/robert-e-slavin/what-is-a-large-effect-si_b_9426372.html – which suggests that the average effect size of interventions is much smaller.
Indeed, a recent meta-analysis of the relationship between teachers being ‘coached’ and pupil outcomes the pooled ES is around 0.11 – which the effect size tending to halve when over a 100 teachers are involved in a study. Kraft, M. A., Blazar, D., & Hogan, D. (2016). The Effect of Teacher Coaching on Instruction and Achievement: A Meta-Analysis of the Causal Evidence
In other words, you are absolutely right to raise issues with correlation and effect size
Gary
David I enjoyed this post and think it stands to make a useful contribution to the debate on research in education.
However I’d like to raise a few issues for further consideration. You state at the outset that:
“Epidemiology is the science of trying to find out what makes people healthier. Epidemiologists look at data to identify causal links between improved health and other factors. It is a correlational science which means that it can never really prove a causal link it can only suggest that a connection between two or more variables is unlikely to be caused by chance.”
I guess in the broadest sense epidemiology is USED to try to make people healthier, but examination of the literal meaning of the word is also important – it means “upon the people” – so it is actually the study of factors that exert influence “upon the people” (i.e., a particualr group of people), at some kind of population level.
It seems logically inconsistent, though, to claim that epidemiologists look for *causal links* by virtue of their use of *correlation*. As any 101 Research Methods student will tell you, correlation does not establish causality – it only establishes the conditions under which further questions can be asked about a range of possible causal pathways. Having taught and researched alongside some very fine epidemiologists, I am pretty sure they would be unhappy with the “correlational science” descriptor, as they design and implement studies using a range of methodologies, ranging from case-control, cross-sectional, pre-post designs, mixed within and between designs, interrupted time series, randomised controlled trials (with and without blinding) etc etc. Some such methodologies are in fact very effective tools for establishing cause and effect, at least under experimental conditions.
Modern epidemiology is heavily influenced by Bradford Hill’s Criteria for Establishing Causality – see for example http://www.who.int/bulletin/volumes/83/10/792.pdf
I think it would be both useful and interesting to see these criteria discussed in relation to education research.
Best wishes
Pam
Thanks Pamela – your points are fair and, of course, correct. I wasn’t intending to take epidemiologists to task – merely to highlight how thinking in one domain can be picked up and misunderstood in another. I didn’t know the root of the word, which is fascinating, and to some some extent we could therefore argue that education researchers are actually epidemiologists. If this is the case, they could certainly learn a thing or two from Hill’s criteria.
Just an example of a claim which is credible but, I belive, unfalsifiable. In second language acquisition research the well-known scholar Stephen Krashen puts forward his comprehension hypothesis. It is that second language acquisition, like first, only occurs through undetstanding messages, i.e. through “comprehensible input”. Furthermore, he claims that consciously learned language (declarative knowledge practised in the skill acquisition fashion, cannot become unconscious or “acquired”.
Scolars have argued that this is unfalsifiable since we cannot be certain how the knowledge found its way into the unconscious (procedural) domain. On the other hand, the hypothesis is credible, many would argue, since if it works for young native language learners, why should it not for foreign language learners.
It’s a huge debate in second language acquisition research which sits awkwardly with some of the cognitive theory you blog about from time to time.
Can vocabulary and syntactic knowledge go straight into long term memory or the unconscious without going through working memory?? I get put of my depth at this point!
“Can vocabulary and syntactic knowledge go straight into long term memory or the unconscious without going through working memory?” My immediate reactions is, obviously not. How could it? That would require some sort of spooky osmosis. Now, of course, I don’t know for sure that there *isn’t* some sort of spooky osmotic process going on but this theory falls foul of Occam’s razor. We already have a perfectly workable theoretical model (WWM) that explains language acquisition without recourse to something unknown or unknowable. This, in my view, is precisely the problem with unfalsifiable claims. Yes they can be interesting and – I suppose some few can prove useful as thought experiments – but they do not provide a useful foundation from which we can make justifiable claims.
However, most scholars in the field believe that the majority of language acquisition does occur at a sub-conscious level (by hearing and perhaps reading large amounts of language). Working memory has never, as far as I know, been brought in to support the comprehension hypothesis. In fact, some scholars categorically reject that linguistic knowledge can be proceduralised through skill practice.
Would you be able to explain your own acquisition of English by the WM /LTM model? Babies and toddlers certainly bring no declarative knowledge or conceptual thinking to the task. Osmosis is too vague a word to use with reference to language acquisition, but cognitive models of learning don’t seem to work. The issue for foreign language teachers is to what extent cognitive and “natural” approaches should be employed. The jury is still out.
The mistake here is to equate processing in WM with skill practice. It might be more useful to think of WM as attention – we don’t learn things we don’t attend to. And yes, L1 acquisition can certainly be explained by WMM with the addition of Geary’s evolutionary education theory. We are evolutionarily adapted too learn the language of our social group incredibly rapidly. The speculation is that we possess innate grammar modules that adapt to the language of our group. There is no discrepancy between cognitive and ‘natural’ approaches to language learning; all learning is cognitive.
The incredible thing about language learning is that it all depends on groupness. Group socialisation theory predicts that culture is transmitted via social groups – peers – and this is why L2 immersion works so well – there are real evolutionary pressures to pick up the culture of the social group we’re in. If you take, say, a Russian speaker and put her in an English school for deaf children, she will learn to sign within six months. Even more astonishing, if you take a group of children with no shared language. they will rapidly invent one, teach it to each other and became fluent in it incredibly rapidly. This has been documented not just in children of immigrant who only speak pigeon language but also in newly established deaf school – the deaf community in Nicaragua is a remarkable example.
The piracy/global warming common factor is fossil fuel burning ships. As the industrial revolution took over, ships got bigger and faster, and harder to pirate. Also, more and more carbon fuels were burnt, and the temperature rose.
Which is a good example of how things can correlate without being causal.
[…] * If you need a reminder about how correlation works, take a look at this post. […]
[…] Are you fooling yourself? Education and epidemiology, by David […]
Thanks David for another excellent analysis. Another trend which people should be aware of is the converting of correlation into an effect size. An effect size is meant to be calculated from a proper experimental design – random assignment of participants into a control and experimental group, etc. Thus the correlation study is given the guise of proper science and therefore can be used to infer causation. Hattie does this all the the time. Details can be found here – http://visablelearning.blogspot.com.au/p/effect-size.html
[…] I’m the first person to accept that the kind of epidemiological studies that have produced this evidence are not the same thing as proof, but when so much data points in a […]
[…] Test bias is different. For a test to be biased against a particular group, scores would have to consistently under or over predict performance of that group in the real world. By predict, obviously I’m talking about certainties; no test is ever 100% accurate in its predictions. Instead we need to talk about correlations and probability. […]