Turning the tanker: lesson grading

I spent a good part of the past year or so railing against the injustices of lesson grading:

My impatience with some Ofsted inspectors 24th July 2014
Ofsted: The end of the (lesson grading) affair 4th June 2014
Should Ofsted judge ‘quality of teaching’? 26th May 2014
A horror story: Does Ofsted get it wrong again? 23rd May 2014
Ofsted inspectors continue to do whatever they like 21st May 2014
Watching the watchmen: Is Ofsted fit for purpose? 16th March 2014
The mystery of Oldfield School’s missing Ofsted report 17th March 2014
What inspirational teaching looks like according to Ofsted 18th February 2014
What I learned from my visit to Ofsted 19th February 2014
Are we any clearer? Ofsted explain what they do and don’t do 21st February 2014
Ofsted’s Evaluation Form: the next skirmish! 22nd February 2014
The shocking mediation of Ofsted criteria by ‘rogue’ inspectors 10th November 2013
Who inspects Ofsted? 5th February 2012 – who indeed?
Myths: what Ofsted want 17th March 2012 – do they even know?

The good news is that Ofsted listened:

Ofsted’s new Inspection Handbook – a cause for celebration 30th July 2014

But, as the cliche goes, it takes time to turn a tanker. The tanker in this metaphor is some of the more entrenched school leaders still labouring under the misapprehension that grading lessons is either valid, reliable or desirable.

This post comes from a teacher who works in a school where individual lessons continue to be graded. As such they have made the hard decision not to publish on their blog to avoid a potential backlash. But this stuff needs saying so I’m more than happy to publish here. Also, one of my posts is referenced so it must be good!

Defence Against the Dark Arts

Pick up your wand. Now, remember to swish and flick as you enunciate the words: Mysterium tremendum et fascinans!

Did it work? Did anything happen?

Of course it didn’t. And that’s not just because you are a muggle.

It didn’t work because mysterium tremendum et fascinans isn’t a magic spell.

In fact, mysterium tremendum et fascinans is the phrase used by theologian Rudolf Otto in his 1917 work, The Idea of the Holy, in an attempt to schematise the non-rational ideas of the spiritual and the divine. Otto coined the term ‘numinous‘ to describe this “wholly other”, which he said has three elements, indicated with the Latin phrase ‘mysterium tremendum et fascinans‘: the mystery which both repels and fascinates; that which we tremble before and are simultaneously attracted by; that which provokes terror just as it compels us.

Thus waving around a wand whilst you incant that phrase won’t bring about magic.

So let’s try again.

Now pick up your wand once more. Again, remember to swish and flick as you enunciate the word: Legilimens!

Did that one work? Did anything happen? No? That’s odd, because that one actually is a spell. It’s the Legilimency Spell, used in the Harry Potter books by both Voldermort and Dumbledore ” to delve into the minds of their victims and to interpret their findings correctly.” It’s basically a mind-reading spell (from Latin legere, ‘to read’, and mens, ‘mind’).

So why didn’t it work? Well, obviously it didn’t work because you and I both know that magic spells don’t really exist. And there certainly isn’t a way we can look inside someone’s mind and know exactly what they are thinking. I can barely tell what’s happening inside my own mind for the majority of the school term.

Yet if you observe lessons and you grade it on the learning taking place in that lesson, you are making an assumption that you can read minds. Because learning is something that happens inside the mind of the pupil. It is invisible. As psychologist Robert Bjork tells us, the goal of teaching a lesson is:

“…to facilitate learning, which must be inferred at some point after instruction. Learning, however, must be distinguished from performance, which is what can be observed and measured during instruction or training.”

So what you actually observe in a lesson is performance, not learning. Thus what we actually do in a lesson observation is merely try andinfer learning from what we can see. We are just guessing using a set of proxies. Proxies which, according to Professor Robert Coe(Director of CEM at Durham University), are usually ineffectual for observing learning – we assume that seeing these things tell us that someone is learning, when they do no such thing. Here are some of his examples:

From 'Improving Learning: A triumph of hope over experience' by Professor Robert Coe, Durham University. http://www.cem.org/attachments/publications/ImprovingEducation2013.pdf — From ‘Improving Learning: A triumph of hope over experience’ by Professor Robert Coe, Durham University.

Furthermore, the proxies we use for observation and grading differ from person to person. The huge MET study – funded by the Gates Foundation in the U.S. – offered up startling data on the reliability of lesson grading judgements. Professor Coe calls this study “the gold standard in observation”, given the significant amount of training and validation that went into preparing observers to make judgements. Coe says of the data produced from the study:

“One way to understand these values is to estimate the percentage of judgements that would agree if two raters watch the same lesson. Using Ofsted’s categories, if a lesson is judged ‘Outstanding’ by one observer, the probability that a second observer would give a different judgement is between 51% and 78%.

For observations conducted by Ofsted inspectors or professional colleagues, ‘training’ in observation is generally not of the quality and scale used in these studies, and no evidence of reliability is available. Hence, we are probably justified in assuming that the true value will be close to the worst case. In other words, if your lesson is judged ‘Outstanding’, do whatever you can to avoid getting a second opinion: three times out of four you would be downgraded. If your lesson is judged ‘Inadequate’ there is a 90% chance that a second observer would give a different rating.”

So the probability that a lesson will be graded differently by two observers is simply staggering: a one in four chance of a second observer grading an ‘Outstanding’ lesson the same; a one in ten chance of a second observer grading an ‘Inadequate’ lesson the same. Frighteningly inaccurate.

And even if teachers do agree, it doesn’t stand that they have correctly identified the effectiveness of the teaching. This 2011 study by Strong et al. identified “effective” and “ineffective” teachers by their ability to raise student achievement. It then asked observers to identify which teachers were which through watching their lessons. Even though their research “resulted in high agreement among judges” as to which teachers they judged were effective and which were ineffective, the study concluded that they were largely wrong in these judgements (over 60% were incorrect judgements). This is the report’s rather damning verdict of grading observations: “judges, no matter how experienced, are unable to identify successful teachers”.

Yet, despite overwhelming support for not grading lesson observations, it is still happening in the majority of schools. Why?

Because it is the mysterium tremendum et fascinans of teaching.

We all know, deep down – and quite aside from the evidence – that grading lessons is arbitrary and non-rational (David Didau calls it “witchcraft”): the mysterium.

And I’m certain that we are all, from the top down, terrified of the idea of being damned by it (we are all surrounded by testimonies of this, so I shan’t offer any): the tremendum.

Yet it seems we are still drawn into its spell enough to continue with it, with both leaders ignoring the evidence against it and teachers craving to be graded (headteacher Tom Sherrington likens this to Stockholm Syndrome): the fascinans.

Grading individual lessons is a practice that simply must end. Ofsted have declared (and reiterated, twice) that they do not grade individual lessons. It is time for schools to follow suit. It is a process that is damaging to teachers, and therefore it is damaging to pupils and the profession. I am certainly not against appraising teaching – I think that it is a necessity to ensure standards – but all of the evidence tells us that grading lessons doesn’t appraise teaching accurately or effectively.

We need to disarm those using the dark arts of lesson grading. There are two ways we can go about doing this: 1) we can be relentless in continuing to disseminate the evidence that condemns grading individual lessons; or 2) pick up your wand, swish, flick and enunciate:Expelliarmus!

I think 1) might be our best bet.

David Didau2014-10-29T15:24:18+00:00October 29th, 2014|Featured|

12 Comments

Ian Lynch October 29, 2014 at 3:22 pm - Reply

Should we be grading individual pupils then? A lot of the same arguments seem to apply.
Ruth Powley October 29, 2014 at 3:38 pm - Reply

If Strong et al are implying that there is a more objective way of working out ‘effective’ and ‘ineffective’ teachers, are they right, and if so, shouldn’t we be using it?
Dani Quinn October 30, 2014 at 12:16 am - Reply

A DADA post without Dolores Umbridge is truly a missed opportunity.

One thing that intrigues me is who is invested in maintaining the status quo. It’s either people who are scared of changing how things have always been done (a problem of leadership?), or who are better off as things stand (a problem of integrity?). Being judged purely on results/outcomes is frightening (more akin to what we have in our school – it feels fair, but very exposing) and more reliant on consistent work over time. Thinking about anecdata from colleagues across many schools…it’s interesting how many of the people who grade lessons are also people who often have cover for their classes, or deprioritise their own teaching (whilst making clear it should be everyone else’s top priority), or…..find it easier to do a one-off good lesson that actually invest in students’ learning in the long run. That’s taking a very cynical read, but I struggle to take a generous one.

The only generous read I can take is that, other than results for external exams, it’s REALLY difficult to make a fair assessment of effectiveness in terms of results, as it’s hopelessly easy to game systems (if there is a culture where that is acceptable….or expected). I know you can’t reply here but, if there is a means to do so, I would be interested to know more about your ideas when you say “I am certainly not against appraising teaching – I think that it is a necessity to ensure standards.”
- Ian Lynch October 30, 2014 at 10:04 am - Reply
  
  At NAACE, we have tested about 50,000 children so far this term on their computing knowledge. This is before any significant teaching has taken place. Externally set and marked test. If we give this test again in 3 years time to all year 7s we will know how much difference primary teaching has made in that 3 year period. We can automatically feed back results to those schools without them needing to do anything except put the children into the on-line test that takes about 1 hour to complete. We will be providing a new test every 6 months in order to measure progress. Now I’m not claiming this is perfect but there are some advantages. The most important is it saves teacher time. It will provide schools with objective data that probably obviates the need for any summative lesson observations, only those designed to improve teaching would be needed. We can measure departmental progress in relation to the national, individual pupil progress and forecast grades based on consistency of position in cohort. We can feed back to schools information about which things most children find easy and difficult and common misconceptions. For a medical analogy, think of giving a blood sample to get checked out so you can then decide what diet, exercise etc is working or not.
  
  Stats here for anyone interested. https://theingots.org/community/baseline_test_statistics
Hugo kerr October 30, 2014 at 7:37 am - Reply

Frank Smith said, somewhere, that the greatest danger education faces is evaluation. And we do little else, it sometimes seems to me (as an outsider). Evaluation is the basis for denigration, enthusiastically applied in Britain and elsewhere. Education is almost impossible to bottle or measure. The best I had, some truly magical teaching, I didn’t really recognise as such until a decade or so after the event, and I was the recipient of it! Instead of stern evaluation, why don’t we try kindness?
- Ian Lynch October 30, 2014 at 8:31 am - Reply
  
  So are you suggesting we have a public service costing hundreds of billions of tax payers money and take no steps to evaluate it? I can’t see any government ever going down that route. If the police, NHS or any private company said we shouldn’t evaluate our services or the value for money we provide would it have any credibility? No. So I think we can knock on the head any likelihood that public education will be absolved from evaluation. The method of doing it is open to debate but if those methods are heavily influenced only by those with a particular interest in a particular direction it’s not too likely that the rest of society will see this as objective and fair. All part of being in a democracy. Lesson grading might or might not be effective but to anyone who is not a teacher, arguments by teachers that no teacher can be assessed when they themselves are assessed in their own jobs by possibly less objective methods are not going to hold water. That might or might not be fair but it is the way the world works.
Quality Improvement through Collaboration with Others | Pearltrees October 30, 2014 at 8:22 am - Reply

[…] the Headteachers’ Roundtable meeting with Michael Gove and Sir Michael Wilshaw as reported here: Turning the tanker: lesson grading. I spent a good part of the past year or so railing against the injustices of lesson grading: My […]
Hugo Kerr October 31, 2014 at 9:14 am - Reply

No, of course I am not suggesting no assessment whatsoever. (And “hundreds of billions” is a bit OTT.) All I meant, and I think it is self-evident, is that everything is now intensely measured, even where it is not clear that it is actually measurable, what is actually being measured or whether anything needs to be measured. This is particularly so in education, I think. A perfectly good example would precisely be the lesson grade. This is a dubious measure (e.g. see above) and serves to undermine teachers – perhaps by design? (The blob discourse springs to mind.) And of course the profession will gild the lily to some degree, but it is a lily which needs gilding! (The blob again. Our media also.) There is an often malign agenda and measurement can be a tool of it.
Ian Lynch October 31, 2014 at 11:41 am - Reply

Ok, I stand corrected, 90.8 billion for 2014. But then it is hundreds of billions over time. It’s interesting that the people that make the most fuss about lesson grading are those subjected to it. (I can see the hate mail lining up now 🙂 ) I don’t think that is a coincidence. Maybe the politicians just want to know who is competent and to what degree? Lesson grading was dreamt up by HMI not politicians. Why is there a conspiracy inherent in that? Whether it works or not is a different issue but it seems to me a classical political struggle between different interest groups. No-one likes the stress of accountability – children don’t but we inflict it on them rather readily. I don’t do it now, but I used to be a RgI and while it is difficult to be sure about the details of whether an average performer is just having a good or bad day, there are some I’d bet my house on that were not going to be much different in any other lesson. Can we be sure someone taking a driving test is going to be able to sustain it on the road afterwards? Can we be sure that the assessment we made of that child was fair and not going to damage them?There are still not too many proposals to remove the need for a driving test or high stakes school exams. Now it might be that the “damage” caused by the measurement is not worth making it but I think a lot of the emotive stuff about grading lessons is largely driven by politics not rationality. Getting lesson grading dropped is winning a political battle, but the interesting thing will be whether or not the measures and accountability that replace that are any fairer.
dodiscimus October 31, 2014 at 10:44 pm - Reply

This post quite closely mirrors Rob Coe’s original inferences from the MET project and the Strong et al (2011) paper. Every time this comes up and I comment on it, I worry that people will think I’m supporting isolated, high-stakes lesson observations (whether by Ofsted or SLT). I’m not; I think these are unacceptable; I would refuse to carry out this policy if it was demanded of me; those who think holding teachers to account in this way is good leadership I profoundly disagree with. However, I think we are still building this entire argument on Rob’s ResearchEd 2013 session and his one blog post. The MET project used US observation protocols that are nothing like UK ones, and the Strong paper doesn’t seem very relevant to me. In torpedoeing Ofsted, I think Rob has done teachers (and children) a big favour. Maybe sinking graded observations below the waterline is a necessary project to get SLT to follow Ofsted. But I’m not convinced that observations are not useful at all in evaluating teaching. I think this matters because the alternative of being totally data-driven concerns me a lot (@JackMarwood has been pointing this out pretty vigorously recently). Anyway my thoughts on the research in more depth are here http://wp.me/p44DHA-Q but read this first (in the event you want to read at all) http://wp.me/p44DHA-2L so you have a better idea of my view of the big picture.
Annckendrick | Pearltrees November 12, 2014 at 9:43 am - Reply

[…] Turning the tanker: lesson grading. I spent a good part of the past year or so railing against the injustices of lesson grading: My impatience with some Ofsted inspectors 24th July 2014Ofsted: The end of the (lesson grading) affair 4th June 2014Should Ofsted judge ‘quality of teaching’? […]
Why sacrificing chickens will not help us evaluate teachers’ performance | David Didau: The Learning Spy September 16, 2015 at 9:44 pm - Reply

[…] Turning the tanker: lesson grading […]