The Learning Spy - David Didau

Jul 07, 2016

I've been writing enthusiastically about Comparative Judgement to assess children's performance for some months now. Some people though are understandably suspicious of the idea. That's pretty normal. As a species we tend to be suspicious of anything unfamiliar and like stuff we've seen before. When something new comes along there will always be those who get over excited and curmudgeons who suck their teeth and shake their heads. Scepticism is healthy.

Here are a few of the criticisms I've seen of comparative judgement:

It's not accurate.
Ranking children is cruel and unfair.
It produces data which says whether a child has passed or failed.
It attempts to rank effort as well as attainment and this isn't possible without knowing the child involved.
It's just about data.
It's worse than levels. Bring back levels! Everything was better in the past.
It leads to focussing on cohorts instead of individuals.
Parents don't want their children compared to other children.
Learning shouldn't be measured as an abstract thing.
We should take our time marking students' work because they've taken time to produce it.

There may be others.

Let's deal with each in turn.

I can absolutely understand why we might feel sceptical that a fast, intuitive judgement can tell us as much as slow, analytical marking. Surely spending 10 minutes poring over a piece of writing, cross-referencing against a rubric has to be better than making a cursory judgement in a few seconds? On one level this may be true. Reading something in detail will obviously provide a lot more information than skim reading it. There are, however, two points to consider. Firstly, is the extra time spent marking worth the extra information gained? This of course depends. What are you planning to do as a result of reading the work? What else could you do with the time? Second, contrary to our intuitions, the reliability of aggregated judgements is much greater than that achieved by expert markers in national exams. GCSE and A level marking for essay based examinations is between 0.6-0.7. This indicates that there's a 30-40% probability that a different marker would award a different mark. Hence why so many papers have their marks challenged every year. But, if we aggregate a sufficient number of judgements (5 x n) then we end up with a reliability above 0.9. Although any individual judgement may be wildly inaccurate, on average they will produce much more accurate marks than an expert examiner.
It may well be both cruel and unfair to rank children; I'm genuinely ambivalent about that. However, a comparative judgement doesn't attempt to rank children, just their work. Teacher assessments, on the other hand, are much more likely to judge the child rather than the work as investigations into the 'Halo effect' have consistently shown. We are all inadvertently prone to biases which end up privileging students based on their socio-economic background, race and gender. If anything, comparative judgement is less cruel and less unfair than marking.
We might feel squeamish about the idea of an assessment that produces data about whether children have passed or failed but that is the purpose of assessment. Think about it: what's the point of setting an assessment which failed to provide you with information about a child's current performance?
I would be completely against using comparative judgement to rank students' effort as well as attainment. It really isn't possible to say anything meaningful about a child's without knowing them. Thankfully, I've never heard of anyone using CJ in this way.
The criticism that CJ is just about data collection is bizarre. The purpose of the judgements is to focus on the actual work produced by students rather than on trying to using a rubric to assign a mark. Numbers are entirely optional and some of my most successful experiences of using CJ have been when we didn't make any effort to collect or record data.
I too miss NC Levels. There was a lot about them to like. A group of subject experts spent months thinking deeply about children's progression and produced an astonishingly detailed and useful set of documents. I'm saddened by the mad rush of many schools to create their own inferior versions. The point is that using CJ has nothing to do with whether you also choose to use some kind of level system. We just have to understand what levels can and can't do: they great to help us understand why one piece of work is better than another, but terrible at helping us assign a mark. My advice: use Levels after CJ to make sense of the rank order you end up with.
Some people are concerned that producing a rank order means that teachers will end up generalising about an amorphous cohort rather than thinking about children as individuals. This anxiety is understandable, but thankfully misplace. As mentioned above, CJ focuses teachers' on the work students' produce. The judging is only the first part of the process. Once work has been ranked detailed conversations about that work are provoked. If anything, this helps us better understand why an individual may be falling down and helps us pinpoint how we can help them.
As a parent, I'm squeamish about the idea of my children being compared to others. My youngest daughter is in Year 6 and about to collect her SATs and we're all waiting with bated breath. Inevitably, she'll find out how she compares to her classmates. But what's the alternative? Not giving parents grades at all? I may not want my daughter to feel upset about how well she's done compared to others but I don't think I'm alone in being pretty keen to get some kind of objective measure of how she's done. The real point is that comparing children has nothing at all to do with comparative judgement as we saw in point 2 above. That said, what CJ does offer is the ability to show progress much more reliably than any other assessment method. Most parents are, I think, very interested in knowing whether their children are making progress.
We should absolutely try to avoid talking about learning in the abstract. This is hard because learning is abstract. You can't see it, touch it, or taste it. Because of this we come up with metaphors to try to make it more tangible. This is how we end up having conversations revolving around sub-levels of progress, or predicted grades, as if they actually meant something concrete. All assessments provide us with a proxy, this point is whether or not it's a good proxy. I would argue that CJ allows us to make better inferences about learning as an abstract thing because it's so focussed on the concrete. The absences of rubrics means we are one step nearer the thing itself. Additionally, not having a rubric also means we are likely to get a more valid sample of students' ability within a domain. Because a rubric depends on attempting to describe indicative content it warps both teaching and assessment; teachers use mark schemes to define the curriculum and examiners search for indicative content and ignore aspects of great work that didn't make it into the rubric.
In an ideal world we would put the same effort into reading students' work as they put into creating it. Sadly, this thinking has led to the steady rise in teachers' workload and mounting feelings of guilt and anxiety. No teacher, no matter how good they are, will ever be able to sustain this kind of marking for long. But maybe we've been asking the wrong question. Maybe instead we should ask, if students have put all this effort into their work, is it fair that we assess it unfairly and unreliably? The other point is that the 30-second intuitive judgement is only desirable during the judging process. In order to provide meaningful feedback of course you actually have to spend time reading the work too.

Another criticism I've spotted is that CJ is new and a fad. This too is wrong: the idea has been around for decades.

One final point. Assessment is one of the least well-understood and most important aspects of education. Every teacher ought to have a working knowledge of the concepts of reliability and validity. Dylan Wiliam, in typically bullish form, argues that, “it would be reasonable to say that a teacher without at least some understanding of these issues should not be working in public schools.”

I hope that's clarified some of the misunderstandings out there. If there are any others, please add them to the comments and I'll address them there.

The Learning Spy Substack is a sharp, provocative dispatch from the front lines of education, where ideas are tested, myths are challenged, and nothing is taken for granted.

Join me on Substack

10 Misconceptions about Comparative Judgement