Assessment: evolution vs. design

Optimization hinders evolution.

Alan J. Perlis

As we all know, the DfE decided to ditch National Curriculum levels from September 2014 without plans for a replacement. Some have reacted to this with glee, others despair.

On the one hand, we have Tim Oates, an assessment expert and advocate for the removal of levels, saying

We need to switch to a different conception of children’s ability. Every child needs to be capable of doing anything dependent on the effort they put in and how it’s presented to them. Levels get in the way of this… The new national curriculum really does focus on fewer things in greater depth. It emphasises key concepts, key ideas and is full of skills. It includes wide reading, practical work in science and application of maths…. The shift in ideas about ability and in assessment practice means that teachers will have to become experts in assessment in a way they have not had to before. They need to think hard about questions they put to children both through question and answer and on paper. They need to really probe pupils’ understanding.

Dylan Wiliam, another member of the Expert Panel which decide to do away with levels has said,

But then schools started reporting levels every year, and then every term, and then on individual pieces of work, which makes no sense at all since the levels had been designed to be a summary of the totality of achievement across a key stage. And then Ofsted inspectors insisted students should make a certain number of levels of progress each year and started asking students what level they were working at, in response to which schools started training students to answer appropriately. And don’t get me started on sub-levels…

So that is why, when I was appointed as a member of the Expert Panel to advise the Secretary of State on revisions to the national curriculum, I recommended that national curriculum levels should be abolished. Not because the levels were a bad idea, but the way they were being used was getting in the way of children’s learning.

Fair enough. He goes on to say, “It will be up to each school to decide how to determine whether children are learning what they need to be learning.”

The other side of the coin is the many and various ways schools have found to replace levels with something a bit worse. I wrote here about how to get assessment wrong. There are a lot of very bad ideas out there. It’s really hard to think yourself out of the constraints of thinking about assessment in ways which don’t repeat the mistakes of the past as I’ve found to my cost. So, wouldn’t it have been better if the DfE had designed a new assessment system and presented it to us and let us get on with the business of teaching? Anecdotally, I was told that the reason the DfE didn’t attempt to design a replacement was because they were too knackered after all the conflict involved in coming up with a new National Curriculum. This may or may not be true, but if it is, it’s pretty irresponsible. After all, wouldn’t get the best possible solution by getting the best thinkers on assessment into the same room and not letting them leave until they’d produced the perfect system? Surely this would be better than leaving the process to chance?

Well, maybe not. In Black Box Thinking, Matthew Syed presents the example of how Unilever went about designing a new nozzle to turn raw chemicals into washing powder. The nozzle they relied on was inefficient and were wasting time and precious resources. They began by assembling a team of mathematicians to work out how best to redesign the nozzle. None of their attempts produced a more effective nozzle and, as a last resort, Unilever turned to its biology department and asked them to have a go. Their strategy was startling in its simplicity. They made ten minor variations to the current design and worked out which of the ten produced the best results. They then repeated this process through 45 generations until they arrived at a ‘perfect’ design. Could the DfE have inadvertantly done us a favour?

Unilever’s ‘evolutionary’ nozzle design process

The process of ‘natural’ selection outperformed intelligent design significantly. It strikes me that there might be an opportunity to harness the evolutionary process to come up with a really great replacement for National Curriculum levels. In order to get one outstanding nozzle, Unilever generated 449 failed attempts. In order for us to arrive at an excellent way of assessing children, we might have to put up with many more failures than that because in order to evolve we would have to learn from our failures and we operate in a system where we are all very proud of our attempts to redesign assessment systems. We’re unlikely to acknowledge that what we’ve produced is a failure, let alone learn from it.

Evolution works because it’s a cumulative process. It depends on each generation having a ‘memory’ of previous generations. Sadly, pretty much everyone working on assessment is working alone and in the dark with no idea how well anyone else’s system is performing.

I have no idea whether this would work, but what I propose is that the DfE should collect examples of every replacement for levels that schools have generated and subject them to testing. Ideally the testing would randomised controlled trials (RCTs) to minimise the narrative thinking to which we so easily fall victim. Maybe – and I realise I’m on shaky ground here – it would even be possible to use some sort of computer modelling? I don’t know how precisely this could be done, but I’m sure that if there were a will there would be a way.

One possible way to overcome the problem of needing a quantifiable definition of success to determine which assessment systems got the chance to ‘pass on their genes’ would be to use a comparative judgements system like the one used by Chris Wheadon’s No More Marking system. Here, all we would need is a bunch of relatively savvy PhD students (or, who knows, maybe even teachers?) who would simply compare two systems and state which one they felt was better. With enough comparison, we could arrive at a rank order which could then inform the next generation of systems. Have a look at this blog for a bit more information on how we might go about this. I don’t have the technical know how to make this work, but am reasonably confident that it could be done relatively quickly and cheaply.

David Didau2015-10-16T20:57:38+01:00October 13th, 2015|research|

18 Comments

Fiachra O'Brien October 13, 2015 at 4:31 pm - Reply

Evolution by Natural Selection is based on clear feedback loops. i.e. it reproduces it doesn’t, the nozzle does the ‘measurable’ task better or it doesn’t ….

Such clear feedback loops would be difficult with an assessment system. is it clear when it has succeeded? How long would a trial need to be? Would the same staff be there to answer the question as were there when the question was set?

The whole levels thing was a factor of politics hitting education and headteachers being unable/unwilling to stay sensible under the pressure that came with that.

A leadership problem imo.- not a tool problem. I’m agreeing with DW there I think but would have voted no myself, I’m no ostrich!!!!
- David Didau October 13, 2015 at 4:45 pm - Reply
  
  I appreciate both how natural selection works and how difficult it would be to replicate this process with assessment which is why I suggested computer modelling. I have no idea if that could work.
  - Fiachra O'Brien October 13, 2015 at 4:51 pm - Reply
    
    Probably not. Computer Modelling also required feedback loops i.e some quantifiable definition of success.
    Me personally, I liked APP, 6 chn. in the class. Sit down with colleagues from time to time and thrash your judgements out… and still only be probably sure you were right. Doubt is our friend!! … I think.
    - David Didau October 13, 2015 at 5:14 pm - Reply
      
      You don’t think it would possible to quantifiably define success?
      - Fiachra O'Brien October 13, 2015 at 5:28 pm
        
        Not quantifiable for a computer model.
        e.g. Parents are happy the system informs them = opinion
        It informs teaching effectively = subjective leader opinion- difficult to isolate an infinite number of other variables
        
        Best I can imagine is falsifiable. ie. That other like schools/ benchmarks do it better. but then you have the inability to isolate variables and then what’s a like school etc.
        
        Schools can form a view of course, but with non-standardisable feedback loops. Most certainty possible I’d say. But then if that school reproduces more effectively ….
      - David Didau October 13, 2015 at 5:33 pm
        
        But that would be perfect…
missdcox October 13, 2015 at 6:15 pm - Reply

The issue is that schools and teachers moan about change. If they’ve started a new system in September, their mindset isn’t ‘how can we tweak this?’ But ‘oh no, not more change’.

My favourite word in my teaching is ‘trial’; ‘I’m going to trial this’. This gives space for failure, development and success. Unfortunately many schools rushed into a single system and won’t review it and tweak it.

My PM target is to ‘tweak’ my new assessment model. I’m happy with that.
- David Didau October 13, 2015 at 6:31 pm - Reply
  
  That’s a really good point – thanks
Dylan Wiliam October 13, 2015 at 8:33 pm - Reply

David: for me, what’s important is that there should be no national system of monitoring progress. As we discovered with the system of national curriculum levels, when there is a national system, Ofsted expect a certain amount of progress each year. The goal is then always to make the required amount of progress each even if that undermines future learning. I think that having some bad systems is preferable to the stultifying effect of a national model, but of course, I could be wrong…
- David Didau October 13, 2015 at 8:49 pm - Reply
  
  I see your point, but I’m not advocating for a national system. I do think that the DfE has a responsibility to help schools come up with increasingly better systems. The (potentially) great thing about this idea is that a soon as you come up with the ‘perfect’ system, someone can tweak it and make it better. I realise there’s a fair bit of naivete but it seems like it would be at least worth a shot.
- dodiscimus October 14, 2015 at 8:15 pm - Reply
  
  Do you not think, Dylan, that if schools have different systems, Ofsted inspectors will still be looking for that kind of linear model of progress, as part of the process of judging whether or not pupils are making ‘expected progress’ over time. I’m expecting an inspector faced with a novel assessment / tracking system to be asking what the ‘expected progress’ looks like within that system and how the school knows that this is a suitably high expectation. Under threat from predators, evolution suggests herding is a successful strategy. I think we may get a plethora of stultifying models, instead of just one.
Sara October 14, 2015 at 8:58 am - Reply

Years ago, when computers were still big machines spitting out long lists of numbers if you fed them, one was bought for the Archaeological Department here. The old professor, determined to master it, would set himself in the cupboard where it was placed (for safety !) and he fed it with data, measurements of all kinds, to see if it would come up with something he had not thought of. It never did…
I know they work faster these days, can swallow more, etc, but as for judgments they are just machines.
- David Didau October 14, 2015 at 8:17 pm - Reply
  
  You know I’m not suggesting we use the judgement of machines, right?
dodiscimus October 14, 2015 at 8:25 pm - Reply

I think the best way to compare two assessment systems would be to ask experienced teachers moving school. A good assessment system should set clear learning goals and then assess children’s progress in relation to these, providing feedback to teachers so they can spot the gaps and help children to close them. After a while, a teacher moving school should be able to judge whether the new system helps this to be done better, worse, or about the same. Given enough systems, each one with carefully categorised features, and enough teachers moving between them and you might get some useful data.
luckyjim1980 October 15, 2015 at 5:49 pm - Reply

I suppose what you are suggesting here for schools David is not a generational approach to evolution (a la Unilever’s nozzles) but a large scale comparison between different schools concurrently using different assessment systems to see which have the best outcomes.

The difficulty here is that there are so many other factors at play that could affect outcomes within each school (quality of staff, quality of leadership, contextual factors, curriculum design and innumerable others). The school that shows the best outcomes may not be the school with the best assessment system – their system might be only averagely good but they could have an awesome staff team, expertly tailored curriculum etc.

Trying to control for all these factors would be a practical impossibility I fear, even with computer assistance.

That said, some kind of comparative study might be able to identify some common features amongst the assessment systems used by the schools with the best outcomes which would be very valuable information. Equally (and perhaps even more useful) it might identify common factors in the schools with the weakest outcomes.

Of course, as there are so many complex factors at play in schools which make every school’s circumstances unique, the idea that we could find one assessment system that is optimal for all schools is almost certainly misguided. Still, all children are unique and yet we can still find approaches that will work better than others for the vast majority of them (e.g. synthetic phonics) – so I would strongly argue that we should not be using “schools are unique” as an excuse not to look for information about the features of more and less effective assessment systems.
- David Didau October 16, 2015 at 8:55 pm - Reply
  
  No, I really am suggesting a generational approach. Comparison could reveal a ‘winner’ for each generation which would help those interested to iterate to even better systems. The idea that we would or could find one assessment is as daft as it is impossible. Complexity is a feeble excuse to deliberately continue making sub-optimal decisions.
October on The Learning Spy | David Didau: The Learning Spy November 1, 2015 at 10:58 am - Reply

[…] October Assessment: evolution vs. design Might the iterative process of evolution be a better way for us to approach post-levels assessment […]
How to deal with criticism | David Didau: The Learning Spy November 9, 2015 at 8:04 pm - Reply

[…] I’ve got better at being criticised. Today I wrote a post criticising Michael Rosen’s views on grammar and, naturally enough, he was a bit critical of my criticism. He may well have felt a little stung. I’ve tried hard not to get emotional in response. Recently Daisy Christodoulou pointed out that my ideas on assessment weren’t as great as I thought they might be. Instead of sulking I went away to rethink, and, as a result, have learned something new. […]