Like an ultimate fact without any cause, the individual outcome of a measurement is, however, in general not comprehended by laws. This must necessarily be the case.
Wolfgang Pauli

A month or so back I met Professor Steve Higgins from Durham University’s Centre for Evaluation and Monitoring. He presented at researchED’s primary literacy conference in Leeds and what he had to say was revelatory. His talk was on the temptations and tension inherent in the EEF’s Pupil Premium Toolkit. As most readers will know, the toolkit is a bit of a blunt instrument and presents interventions in terms of how many months of progress teachers can expect to add if they have a crack at them.
Screen Shot 2015-06-10 at 11.17.44
This leads to all sorts of misunderstandings and mistakes. Well-intentioned school leaders leap on the top scoring inventions and confidently conclude, “Yay! If we do feedback and metacognition our students will make a whole 16 months of extra progress!” Sadly, it’s all a bit more complicated than that.
The reported impact for an intervention is an average. Research on each of the different interventions is aggregated to show a normal distribution of effects. So for an effect size of 0.8 we might get a distribution a bit like this:Screen Shot 2015-06-10 at 11.33.12So, what does this actually tell us? Well, for the headline figures to be meaningful, we really have to look at the shape of the distribution to see just how good our implementation of an intervention would have to be to get an average effect. Consider this example of one of the ‘best bets’ like feedback or metacognition:
Screen Shot 2015-06-10 at 11.36.44
The wide distribution tells us that some studies will have shown the intervention to have fairly poor impact whereas other studies will have demonstrated extraordinary impact. The area shaded in mauve indicates the sort of impact we would have to aim at in order to get anywhere near the +8 months reported by the Toolkit. In a best bet, our intervention only has to be of average effectiveness in order to reap rewards. This helps to explain why the possible negative impacts of feedback are so powerful. As Hattie says, “Feedback is one of the most powerful influences on learning and achievement, but this impact can be either positive or negative.”
However, if we turn our attention to interventions with good, but more modest impacts, like, say, digital technology or small group tuition, both reported as providing +4 months progress, the bell curve will look more like this:
Screen Shot 2015-06-10 at 11.36.57What this demonstrates is that our intervention will have to slightly better than the average implementation of this approach in order to be as worthwhile as we might want. And if our intervention goes badly, there’s actually a risk it might make a negative impact on progress.
Which brings us to some of the riskier approaches. Somewhat controversially, the EEF reports that there’s fairly robust evidence (indicated by the 3 padlocks) that implementing Learning Styles offers +2 months of progress for a very modest outlay of time and resources. That’s not too shabby, is it?
Screen Shot 2015-06-10 at 11.54.46But surely Learning Styles has been thoroughly debunked and dismissed? What’s going on? Let’s have a look at the bell curve:
Screen Shot 2015-06-10 at 11.37.13What this tells us is that it may actually be possible to implement Learning Styles in a way that benefits pupils’ progress. Maybe your school will be one of the lucky few. But the average effects are fairly negligible and probably not worth even modest outlays. And there, on the left-hand side of the distribution is why implementing a strategy like Learning Styles is so risky: 50% of the studies will show impacts of less than +2 months. And an unacceptably high number of studies will have reported negative impacts will actually impede pupil’s progress.
I should add that the bell curves shown in this post are from Steve’s presentation and don’t actually represent the actual distributions for the particular inventions I’ve discussed in this post. Apparently feedback has one of the widest distributions of effects whereas other interventions have much sharper peaks with less leeway either side. Dylan Wiliam provided this distribution of the 607 effect sizes of feedback found by Kluger & DeNisi in their seminal 1996 meta-analysis.
feedback effect sizes
As you can see, the distribution is anything but normal with the effects averaging around 0.41, but 38% of the effect sizes were negative. If this is anything to go by it tells us our attempts to give students feedback must be very carefully thought out indeed. Doing feedback averagely well looks to be a waste of time! There are, however, some very intriguing outliers which is where further research and experimentation ought to be focussed
After listening to the presentation, I asked Steve if the actual distributions are available. He explained that when the Toolkit was being put together that it was decided that displaying the bell curves would be too complicated and would just over-burden poor, unsophisticated teachers. Nonsense!
I don’t know about you, but I said then and still think this is nonsense! Maybe it might be too complex for some but then, as I used to tell my students, nobody ever rises to low expectations. Hell, I’m only an English teacher! If I can get it, so can you! If the information was available, some of us would make an effort to understand it. If the information isn’t available then we guarantee a lowest common denominator is achieved. (This is very much the problem with differentiating resources by the way.) Happily Steve agreed and committed himself to doing something about it. I saw him again yesterday and lo! he’s in the process of making the bell curves available on the EEF website. Hopefully soon, we’ll be able to click through from the headline figures and actually examine the shape of the curve for any intervention we’re considering implementing. This is a minor triumph for teachers and might be a small step along the road to greater professionalism.