I spent a good part of the past year or so railing against the injustices of lesson grading:

My impatience with some Ofsted inspectors 24th July 2014
Ofsted: The end of the (lesson grading) affair 4th June 2014
Should Ofsted judge ‘quality of teaching’? 26th May 2014
A horror story: Does Ofsted get it wrong again? 23rd May 2014
Ofsted inspectors continue to do whatever they like 21st May 2014
Watching the watchmen: Is Ofsted fit for purpose? 16th March 2014
The mystery of Oldfield School’s missing Ofsted report 17th March 2014
What inspirational teaching looks like according to Ofsted 18th February 2014
What I learned from my visit to Ofsted 19th February 2014
Are we any clearer? Ofsted explain what they do and don’t do 21st February 2014
Ofsted’s Evaluation Form: the next skirmish! 22nd February 2014
The shocking mediation of Ofsted criteria by ‘rogue’ inspectors 10th November 2013
Who inspects Ofsted? 5th February 2012 – who indeed?
Myths: what Ofsted want 17th March 2012 – do they even know?

The good news is that Ofsted listened: 

Ofsted’s new Inspection Handbook – a cause for celebration 30th July 2014

But, as the cliche goes, it takes time to turn a tanker. The tanker in this metaphor is some of the more entrenched school leaders still labouring under the misapprehension that grading lessons is either valid, reliable or desirable. 

This post comes from a teacher who works in a school where individual lessons continue to be graded. As such they have made the hard decision not to publish on their blog to avoid a potential backlash. But this stuff needs saying so I’m more than happy to publish here. Also, one of my posts is referenced so it must be good!

Defence Against the Dark Arts

Pick up your wand. Now, remember to swish and flick as you enunciate the words: Mysterium tremendum et fascinans!

Did it work? Did anything happen?

Of course it didn’t. And that’s not just because you are a muggle.

It didn’t work because mysterium tremendum et fascinans isn’t a magic spell.

In fact, mysterium tremendum et fascinans is the phrase used by theologian Rudolf Otto in his 1917 work, The Idea of the Holy, in an attempt to schematise the non-rational ideas of the spiritual and the divine. Otto coined the term ‘numinous‘ to describe this “wholly other”, which he said has three elements, indicated with the Latin phrase ‘mysterium tremendum et fascinans‘: the mystery which both repels and fascinates; that which we tremble before and are simultaneously attracted by; that which provokes terror just as it compels us.

Thus waving around a wand whilst you incant that phrase won’t bring about magic.

So let’s try again.

Now pick up your wand once more. Again, remember to swish and flick as you enunciate the word: Legilimens!

Did that one work? Did anything happen? No? That’s odd, because that one actually is a spell. It’s the Legilimency Spell, used in the Harry Potter books by both Voldermort and Dumbledore ” to delve into the minds of their victims and to interpret their findings correctly.” It’s basically a mind-reading spell (from Latin legere, ‘to read’, and mens, ‘mind’).

So why didn’t it work? Well, obviously it didn’t work because you and I both know that magic spells don’t really exist. And there certainly isn’t a way we can look inside someone’s mind and know exactly what they are thinking. I can barely tell what’s happening inside my own mind for the majority of the school term.

Yet if you observe lessons and you grade it on the learning taking place in that lesson, you are making an assumption that you can read minds. Because learning is something that happens inside the mind of the pupil. It is invisible. As psychologist Robert Bjork tells us, the goal of teaching a lesson is:

“…to facilitate learning, which must be inferred at some point after instruction. Learning, however, must be distinguished from performance, which is what can be observed and measured during instruction or training.”

So what you actually observe in a lesson is performance, not learning. Thus what we actually do in a lesson observation is merely try andinfer learning from what we can see. We are just guessing using a set of proxies. Proxies which, according to Professor Robert Coe(Director of CEM at Durham University), are usually ineffectual for observing learning – we assume that seeing these things tell us that someone is learning, when they do no such thing. Here are some of his examples:

From 'Improving Learning: A triumph of hope over experience' by Professor Robert Coe, Durham University. http://www.cem.org/attachments/publications/ImprovingEducation2013.pdf
From ‘Improving Learning: A triumph of hope over experience’ by Professor Robert Coe, Durham University.

Furthermore, the proxies we use for observation and grading differ from person to person. The huge MET study – funded by the Gates Foundation in the U.S. – offered up startling data on the reliability of lesson grading judgements. Professor Coe calls this study “the gold standard in observation”, given the significant amount of training and validation that went into preparing observers to make judgements. Coe says of the data produced from the study:

“One way to understand these values is to estimate the percentage of judgements that would agree if two raters watch the same lesson. Using Ofsted’s categories, if a lesson is judged ‘Outstanding’ by one observer, the probability that a second observer would give a different judgement is between 51% and 78%.

For observations conducted by Ofsted inspectors or professional colleagues, ‘training’ in observation is generally not of the quality and scale used in these studies, and no evidence of reliability is available. Hence, we are probably justified in assuming that the true value will be close to the worst case. In other words, if your lesson is judged ‘Outstanding’, do whatever you can to avoid getting a second opinion: three times out of four you would be downgraded. If your lesson is judged ‘Inadequate’ there is a 90% chance that a second observer would give a different rating.”

So the probability that a lesson will be graded differently by two observers is simply staggering: a one in four chance of a second observer grading an ‘Outstanding’ lesson the same; a one in ten chance of a second observer grading an ‘Inadequate’ lesson the same. Frighteningly inaccurate.

And even if teachers do agree, it doesn’t stand that they have correctly identified the effectiveness of the teaching. This 2011 study by Strong et al. identified “effective” and “ineffective” teachers by their ability to raise student achievement. It then asked observers to identify which teachers were which through watching their lessons. Even though their research “resulted in high agreement among judges” as to which teachers they judged were effective and which were ineffective, the study concluded that they were largely wrong in these judgements (over 60% were incorrect judgements). This is the report’s rather damning verdict of grading observations: “judges, no matter how experienced, are unable to identify successful teachers”.

Yet, despite overwhelming support for not grading lesson observations, it is still happening in the majority of schools. Why?

Because it is the mysterium tremendum et fascinans of teaching.

We all know, deep down – and quite aside from the evidence –  that grading lessons is arbitrary and non-rational (David Didau calls it “witchcraft”): the mysterium.

And I’m certain that we are all, from the top down, terrified of the idea of being damned by it (we are all surrounded by testimonies of this, so I shan’t offer any): the tremendum. 

Yet it seems we are still drawn into its spell enough to continue with it, with both leaders ignoring the evidence against it and teachers craving to be graded (headteacher Tom Sherrington likens this to Stockholm Syndrome): the fascinans.

Grading individual lessons is a practice that simply must end. Ofsted have declared (and reiterated, twice) that they do not grade individual lessons. It is time for schools to follow suit. It is a process that is damaging to teachers, and therefore it is damaging to pupils and the profession. I am certainly not against appraising teaching – I think that it is a necessity to ensure standards – but all of the evidence tells us that grading lessons doesn’t appraise teaching accurately or effectively.

We need to disarm those using the dark arts of lesson grading. There are two ways we can go about doing this: 1) we can be relentless in continuing to disseminate the evidence that condemns grading individual lessons; or 2) pick up your wand, swish, flick and enunciate:Expelliarmus!

I think 1) might be our best bet.