Contrary to popular opinion, I’m not all that bothered about test scores. I mean, obviously I’d far prefer pupils did well rather than poorly on a summative exam, particularly if it is likely to have some bearing on their future life chances – who wouldn’t? – but I’m certainly not interested in raising test scores for the sake of raising test scores.

Which is why I feel taken aback when people say things like this:

The simple answer to this leading question is, no. Like most people involved in education I want students to have the best possible chance of leading happy, productive lives; to go out into the world and flourish. Academic success is absolutely not necessarily a particularly important thing in and of itself. I just happen to believe that doing better in school increases the range of options open to us and is thus more likely to result in what I want for young people.  Children will learn a great many things – positive and negative – that no test will ever measure and for which there will be no certification. Test scores are a very imperfect proxy for establishing whether children have in fact achieved some measure of academic success. And that’s it. They have no inherent value.

That said, test scores are a pretty good proxy for establishing whether an educational intervention is likely to be worthwhile investing in. Most education research uses effect sizes to make it possible for us to compare which interventions are likely to be more profitable than others and these effect sizes are based on scores in tests.* We’re all prone to a wide range of cognitive biases which prevent us from being able to evaluate the effectiveness of a strategy in isolation. We may think something is working well, but our hopes and preferences blind us to reality; if our preferred approach leads to no or negligible impact on test scores then we should start to consider the prospect that we might be wrong. In order to raise ourselves above the anecdotal we design studies to try to establish if what we want to believe is actually real. Failure to do so means we are piddling around with naive, pre-scientific ideas about how the world works and thus we can’t expect anyone to take our claims seriously.

But, this doesn’t mean the only tests than can yield valid results are academic tests. Let’s say you want to claim not that your preferred methodology increases academic performance, but that it increases creativity. Or collaboration, or whatever. The first thing you have to do is to clearly define the construct you want to see an increase in. In the case of creativity, this is tricky as not everyone will agree on a definition. Most of the tests cited in support of efforts to raise creativity actually measure something called ‘divergent thinking’. This is normally defined as coming up with as many different solutions to a problem as possible. Here’s what Wikipedia says:

Divergent thinking is a thought process or method used to generate creative ideas by exploring many possible solutions. It is often used in conjunction with its cognitive colleague, convergent thinking, which follows a particular set of logical steps to arrive at one solution, which in some cases is a ‘correct’ solution. By contrast, divergent thinking typically occurs in a spontaneous, free-flowing, ‘non-linear’ manner, such that many ideas are generated in an emergent cognitive fashion. Many possible solutions are explored in a short amount of time, and unexpected connections are drawn. After the process of divergent thinking has been completed, ideas and information are organized and structured using convergent thinking.

This is reasonably clear and you can design tests to measure this construct without too much difficulty – tests such as asking participants to think of as many uses of a paperclip as possible in a limited time. This means we come up with a study where we split kids into two or more randomised groups and gave them all the paperclip test. Then we would give one group our creativity intervention and the others would either get no intervention or some other strategy designed to increase performance in a test of divergent thinking. Then, all the participants would redo the test – or a variant of it – and we could see if anyone’s test scores increased and then determine whether the increase might have occurred by chance and if one group’s increase is greater than the others.

Although I’d be more than happy to agree that one group was measurably better than the others, I’d still be wary of claiming we’d found a way of increasing creativity because divergent thinking isn’t the same thing. We probably don’t actually want to encourage people to list of lots of improbable uses for paperclips; what we really want is for people to have new and useful ideas.

The point is this: if there’s no way for us to measure what you think is important, then we only have your word to go on that what you propose is worthwhile. We know your word isn’t good enough because we know how prone human beings are to making very predictable mistakes. That’s why we have science. If you want to suggest that spending curriculum time on increasing students’ situational engagement instead of on more traditional academic pursuits then the burden of proof is with you. I suggest that you establish how you will measure the benefits you hope to see and then conduct a fair test in which spending time on engagement activities is compared against teaching academic content. If you can you that your approach leads to a measurable improvement in something then I promise to take your claims seriously and consider whether this improve is more beneficial than helping students get the best exam results possible.