How to get assessment wrong

It is the duty of the human understanding to understand that there are things which it cannot understand, and what those things are.

Søren Kierkegaard

With the freedom to replace National Curriculum Levels with whatever we want, there’s a wonderful opportunity to assess what students can actually do rather than simply slap vague, ill-defined criteria over students’ work and then pluck out arbitrary numbers as a poor proxy for progress. But there’s also an almost irresistible temptation to panic, follow the herd and get things badly wrong. Levels are by no means the worst we could do – in fact there was actually much to like about them – if we’re not careful we’ll replace them with something truly awful.

We need always to remember that any system of assessment is an attempt to map a mystery with a metaphor. There’s no way we can every really know everything about what students are learning. All we get to measure is their performance on a given day. Because we can’t see learning we come up with metaphors to make it easier to conceptualise. Levels, ladders, thermometers, graphs are all metaphors. They’re meant to help us to think about something so complex and mysterious it makes the mind boggle. Unfortunately, they often end up concealing the truth that learning is messy and unpredictable. My favourite metaphor for learning is Robert Siegler’s ‘overlapping waves’ model; the tide may be coming in, but individual waves roll in and recede unpredictably. Siegler suggests we make the following assumptions:

At any one time children think in a variety of ways about most phenomena;
These varied ways of thinking compete with each other, not just during brief transition periods but rather over prolonged periods of time;
Cognitive development involves gradual changes in the frequency of these ways of thinking, as well as the introduction of more advanced ways of thinking.

How can you show that on a spreadsheet? Obviously it’s much easier to just ignore all this complexity and pretend learning is predictable.

Here then are a few simple principles for getting things wrong.

Assessment and tracking systems should (not):

display an ignorance of how students actually learn
assume progress is linear and quantifiable, with learning divided into neat units of equal value
predefine the expected rate of progress
limit students to Age Related Expectations (ARE) that have more to do with the limits of the curriculum than any real understanding of cognitive development.
invent baseless, arbitrary, ill-defined thresholds (secure, emerging etc.) and then claim students have met them
use a RAG rating (Red, Amber, Green) based on multiples of 3 to assess these thresholds
apply numbers to these thresholds to make them easy to manipulate
provide an incentive to make things up

Picking holes in other people’s work is easy, but in all seriousness, if you want to work out whether your levels replacement is fit for purpose try asking Michael Tidd’s 7 questions. Also, do your best to resist the myths of progress: the best we can do is to approximate what we think students are learning by looking at what they can actually do here and now.

Many thanks to @jpembroke for helping come up with the silly ideas.

And here are the slides I used at The Key’s conference on Life After Levels.

David Didau2015-05-21T10:45:22+01:00May 20th, 2015|assessment|

27 Comments

mmiweb May 20, 2015 at 12:45 pm - Reply

Gosh David, we’re in danger of fully agreeing on something ;-D

I do like the lost of “nots” and have been talking about very similar things for a while now (since we ‘lost’ levels) especially what I consider to be the biggest lie in education that children progress up the beautiful straight progression curve that fills DfE and Ofsted’s mind when it comes to the nature of the progression of individuals and schools.

I would argue that the core issue is that over the last x years (it seems to have been forever) we have moved away from a system of assessment to a system of tracking and that we need to hold in mind only two core questions when thinking about the assessment of the individual:

(i) What has the child achieved and how do I know?
(ii) What do they need to do next in order to progress

Not that either of these is easy to do – but that’s what makes teaching (and learning) a complex, skilled process

The other question that SCHOOLS or school leaders might want to ask is:

If I compare my school’s and my pupils’ achievements and attainments to a schools which looks and feels like mine in its make-up and children then are they doing better or worse? (and this should be on a more complex set of measures that 5A*-C GCSE though this is a perfectly good measure) If so then can I have a dialogue with this to school to learn from them – help them?

One questions what is the RAG in your point 6?
- James Pembroke May 20, 2015 at 12:53 pm - Reply
  
  If we stop focussing on devising progress measures, and just concentrate on your 2 points above, the world would be a better place.
  
  By the way, RAG = red, amber, green. Generally translates as below, at, above age-related expectations. It’s oversimplified nonsense.
- mmiweb May 20, 2015 at 1:01 pm - Reply
  
  Arrgh – my sloppy fingers, that should have been the “list of nots” – thanks to James (below) for the RAG translation (were the brackets there earlier??) – I was in a school recently that went way belong that they had in order purple – blue – green – orange – yellow – pink – red – it was pretty – or rather pretty meaningless!
How to get assessment wrong - David Didau: The ... May 20, 2015 at 5:42 pm - Reply

[…] It is the duty of the human understanding to understand that there are things which it cannot understand, and what those things are. […]
julietgreen May 20, 2015 at 8:19 pm - Reply

Yes. I too, agree. And we are. Getting it wrong, I mean. I am fighting hard but not having much success.
thinklish May 20, 2015 at 8:38 pm - Reply

Hi David, Is slide 16 an example of how to do it right? It looks good to me but includes the categories of ‘Secure’ and ‘Working towards’ which are listed under your list of the pitfalls of assessment. Thanks for the clarification… enjoying your recent prolific productivity!
- David Didau May 20, 2015 at 10:35 pm - Reply
  
  Haha! Yes I was mindful of that when writing. I would argue (of course I would) that the descriptions aren’t arbitrary – they actually mean something and are distinct from each other. It’s intended to be clear for teachers, parents, students…
  
  Convinced?
  - James Pembroke May 21, 2015 at 9:08 am - Reply
    
    It’s the use of arbitrary thresholds to define ’emerging’, ‘developing’, ‘secure’ that concerns me most. For example, pupil achieving (secure in) 33% objectives by Christmas moves from emerging to developing. The 33% threshold is not linked to any meaningful rate of learning (is there such a thing) but rather it exists because it’s neat and convenient (1/3rd objectives per term). We are creating systems built around neat thresholds to make categorising learning and progress easier. Problem is that the actual data is so far removed from what actually happens in classrooms it’s next to useless for teachers.
    
    Well, that’s my opinion anyway.
  - thinklish May 22, 2015 at 12:11 pm - Reply
    
    Thanks for the clarification. Yes, fine by me! My main unease with assessment with English is when the nuanced descriptions (like those in your table) get turned into precise numbers and then the numbers are over-extrapolated in ways which loses the original nuance…
Catherine May 21, 2015 at 4:04 am - Reply

Hi David. In this post https://www.learningspy.co.uk/featured/what-does-feedback-look-like/#more-7156 you mentioned that you liked the sound of Kev Lister’s RAG123 system, but in this current one you have it listed under the ‘not’ category. I am trying to rejig my own systems of marking and I came across the RAG123 one; it sounded interesting to me. I value your opinions and I wonder if you could explain why your thinking changed? Thank you 🙂
- David Didau May 21, 2015 at 8:04 am - Reply
  
  Hi Catherine – My thinking has not changed.
  
  Using traffic lights to claim progress towards an arbitrary system of assessment is one thing, using it as it tool to more quickly mark students’ work books is another.
  - Catherine May 21, 2015 at 11:15 pm - Reply
    
    Ah, I see! That makes sense. Thank you for replying and clarifying. 🙂
klootmeJen May 21, 2015 at 9:45 am - Reply

David,

Whilst I totally agree with what you are saying here and I have shared this with many people……I’m trying to support schools with their assessment systems to a) inform their teaching b) keep track of progress in the national curriculum c) have a clear system of formative and summative assessment ready for Ofsted inspections…..

So, whether we like it or not, the national curriculum has to be at the heart of what we are assessing and I feel that your assessment grid is one step too far from the curriculum of skills and knowledge that we are required to teach.

Is there anyway that I can send you a draft of what I’ve been working on to get your feedback?

Thanks!
- David Didau May 21, 2015 at 9:57 am - Reply
  
  My grid works perfectly in KS3 where there isn’t really any curriculum at all. The problem with the KS2 curriculum is that because the vast majority of it isn’t assessed it won’t be taught. When push comes to shove, teachers teach what’s assessed.
  
  My grid does a good job of a) in that it encourages teachers to go beyond where they would normally stop, it does b) in a way which is manageable and I just don’t care about c). Sean Harford has made it clear that as long as school leaders can explain their system Ofsted have no requirements. And summative assessment is made clear from end of KeyStage tests.
  
  By all means send me your draft: ddidau@gmail.com
  - klootmeJen May 21, 2015 at 10:05 am - Reply
    
    Brilliant! I only care about c) because I care about the schools I support being beaten over the head with their data……And I really do want to support schools devising effective systems that they CAN explain to Ofsted rather than buying really expense assessment materials that encourage them to test continually that they don’t really understand.
    
    I’ll email you! Thanks
Discrimination, Assessment and the Making of the Classroom Culture | SurrealAnarchy May 21, 2015 at 11:10 am - Reply

[…] to remember that any system of assessment is an attempt to map a mystery with a metaphor.” David Didau uncovers an inherent problem in assessment and that is where we believe our own hype… […]
Tim Jefferis (@tjjteacher) May 21, 2015 at 5:28 pm - Reply

What a wonderful post. All stuff I heartily agree with. I come under constant pressure to increase tracking, monitoring and measuring at my school; but I have always been rather skeptical about it all. It seems to me you can spend an awful lot of time chasing your own tail – time that would be better spent dreaming up more imaginative lessons.

Anyway, I would love to see a post from you on the RIGHT way to assess/track/monitor. I am charged with redesigning the system in my school and I want to create something that is not a monster, that is intellectually coherent, and that works.

Any answers?
Crispin Weston May 21, 2015 at 6:53 pm - Reply

It feels to me that I agree with the first half of many of your sentences/paragraphs, but then end up not agreeing with the second half.

For example:

“There’s no way we can every really know everything about what students are learning. All we get to measure is their performance on a given day”.

I agree that’s all we can get to *measure*. But measuring is not the same as knowing. If we measure performance repeatedly, we can get to infer by statistical analysis what is the learner’s disposition to repeat that performance (I will call it “capability”). That means we can get to know, with a reasonable degree of accuracy, what they have learnt.

“Because we can’t see learning we come up with metaphors to make it easier to conceptualise. Levels, ladders, thermometers, graphs are all metaphors”.

Not really. These metaphors are ways of presenting the idea of scalable progression – that performance b is better than performance a. They are more colourful than a percentage. But what we are ranking is the progression of performance, not the way the neurons are lining up or what internal processes are occurring. A thermometer is not an attempt to visualise internal processes, which do not really concern the assessor.

You might say that the idea of ranking performances is itself suspect. Who is to say that performance b is better than performance a? But if that is what we believe, then what are we doing trying to teach anybody anything? Maybe what they are doing already is just as good as anything they will do at the end of the course. Surely no teacher believes that?

Data analytics will also help us establish what is more difficult than what else (not quite the same, I will admit, as what is better). If everyone who can do b can also do a but not the other way round, then it is fairly safe to assume that b is more difficult than a. Analytics will also be able to show how people learn in the sense of showing what sequences work best – e.g. by showing that if you want people to learn c, you are more likely to be successful if you teach them b first, rather than if you teach them a first. That information is more useful to the teacher than knowing how the neurons are lining up.

Similarly with your myths of progress. I am sure that it is true that progress may often be circuitous and unpredictable and not linear and certain – but how we make progress is a different question to how we measure progress. One step back and two steps forward is still progress – one step forwards and two steps back is still regression.

Thresholds are always artificial but may be necessary. Some people end up passing, some don’t, some get the job, some don’t. We can’t get away from thresholds. Simple categories might make the metric more understandable and might convey a sense of uncertainty, which I think any credible assessment system ought to be aware of. Saying that your mastery of objective x is 67.82% gives a false idea of precision. Like your metaphors, I think thresholds and cateogisations might well be useful visualisations – though they should be layered on top of more sensitive, underlying metrics (for which digital analytics systems will really be necessary).

At root, my disagreement with you is in picturing this as something of an inscrutable mystery. I think it is more a question that we do not yet have the analytics tools to do it properly. Uncertainly is not an insoluble problem: it can itself be quantified.

I hope that this is a conversation we might be able to continue at some point face-to-face – maybe at ResearchEd.

Thanks for the interesting read. Crispin.
Sleepwalking into the Past – Life After Levels? Be Very Careful. | Just Trying To Be Better Than Yesterday May 22, 2015 at 11:55 am - Reply

[…] fact that I thought of nothing else for a good wee while kind of worries me. In addition to reading David Didau’s post on Assessment, I realise that we tend to get on our moral high horse about education in Scotland. David opens his […]
bt0558 May 22, 2015 at 7:21 pm - Reply

Blimey, you should be a consultant.

Surely it is clear to all that learning does not always take time. Well it does in the trivial sense.

And it is possible to test whether someone has learned something.

Did I mention this, you should be a consultant.
Tom Sherrington May 23, 2015 at 8:13 am - Reply

This is great. I know so many schools inventing new ladders and overly complicated grids. It’s so deeply ingrained. They say it’s what parents want but I dispute that. My pet hate is all these data consultants – and some inspectors – who talk lovingly about ‘measuring progress accurately’. Fools!! We need to learn to value ‘judging progress approximately’ and not seek to work out an average for everything, focusing on the detail of what students can and can’t do.
Is it possible to get assessment right? | David Didau: The Learning Spy May 23, 2015 at 2:07 pm - Reply

[…] my last blog on how to get assessment wrong, various readers got in touch to say, OK smart arse, what should we […]
Assessment – it’s all in our heads | Reflecting English May 23, 2015 at 9:36 pm - Reply

[…] have read David Didau’s recent two posts on assessment – here and here – with interest. David is rightly sceptical about the efficacy of assessment rubrics […]
Emma Simmons October 6, 2015 at 9:24 am - Reply

Please could you explain a little more – or point me to where you already have – explained why we should not use the RAG assuming that this is linked to clear skills progression and pupils can see which are they need to work on e.g. using a wider range of punctuation or remembering to use language appropriate for the audience
Assessment: evolution vs. design | David Didau: The Learning Spy October 13, 2015 at 4:21 pm - Reply

[…] the many and various ways schools have found to replace levels with something a bit worse. I wrote here about how to get assessment wrong. There are a lot of very bad ideas out there. It’s really hard to think yourself out of the […]
Proof of progress – Part 1 | David Didau: The Learning Spy January 30, 2016 at 2:23 pm - Reply

[…] progress is a big deal. I’ve written before about the many and various ways we get assessment wrong but, increasingly, I’m becoming convinced there are some ways we might get it right. As […]
Teaching maters, but there are more important things to get right | David Didau: The Learning Spy July 8, 2018 at 12:34 pm - Reply

[…] all of which work to prevent teachers doing the best possible job of teaching their students. Here‘s a list of ways to get assessment wrong. As Becky Allen has pointed out, if we can’t […]