Optimization hinders evolution.

Alan J. Perlis


As we all know, the DfE decided to ditch National Curriculum levels from September 2014 without plans for a replacement. Some have reacted to this with glee, others despair.

On the one hand, we have Tim Oates, an assessment expert and advocate for the removal of levels, saying

We need to switch to a different conception of children’s ability. Every child needs to be capable of doing anything dependent on the effort they put in and how it’s presented to them. Levels get in the way of this… The new national curriculum really does focus on fewer things in greater depth. It emphasises key concepts, key ideas and is full of skills. It includes wide reading, practical work in science and application of maths…. The shift in ideas about ability and in assessment practice means that teachers will have to become experts in assessment in a way they have not had to before. They need to think hard about questions they put to children both through question and answer and on paper. They need to really probe pupils’ understanding.

Dylan Wiliam, another member of the Expert Panel which decide to do away with levels has said,

But then schools started reporting levels every year, and then every term, and then on individual pieces of work, which makes no sense at all since the levels had been designed to be a summary of the totality of achievement across a key stage. And then Ofsted inspectors insisted students should make a certain number of levels of progress each year and started asking students what level they were working at, in response to which schools started training students to answer appropriately. And don’t get me started on sub-levels…

So that is why, when I was appointed as a member of the Expert Panel to advise the Secretary of State on revisions to the national curriculum, I recommended that national curriculum levels should be abolished. Not because the levels were a bad idea, but the way they were being used was getting in the way of children’s learning.

Fair enough. He goes on to say, “It will be up to each school to decide how to determine whether children are learning what they need to be learning.”

The other side of the coin is the many and various ways schools have found to replace levels with something a bit worse. I wrote here about how to get assessment wrong. There are a lot of very bad ideas out there. It’s really hard to think yourself out of the constraints of thinking about assessment in ways which don’t repeat the mistakes of the past as I’ve found to my cost.  So, wouldn’t it have been better if the DfE had designed a new assessment system and presented it to us and let us get on with the business of teaching? Anecdotally, I was told that the reason the DfE didn’t attempt to design a replacement was because they were too knackered after all the conflict involved in coming up with a new National Curriculum. This may or may not be true, but if it is, it’s pretty irresponsible. After all, wouldn’t get the best possible solution by getting the best thinkers on assessment into the same room and not letting them leave until they’d produced the perfect system? Surely this would be better than leaving the process to chance?

Well, maybe not. In Black Box Thinking, Matthew Syed presents the example of how Unilever went about designing a new nozzle to turn raw chemicals into washing powder. The nozzle they relied on was inefficient and were wasting time and precious resources. They began by assembling a team of mathematicians to work out how best to redesign the nozzle. None of their attempts produced a more effective nozzle and, as a last resort, Unilever turned to its biology department and asked them to have a go. Their strategy was startling in its simplicity. They made ten minor variations to the current design and worked out which of the ten produced the best results. They then repeated this process through 45 generations until they arrived at a ‘perfect’ design. Could the DfE have inadvertantly done us a favour?

Unilever's 'evolutionary' nozzle design process

Unilever’s ‘evolutionary’ nozzle design process

The process of ‘natural’ selection outperformed intelligent design significantly. It strikes me that there might be an opportunity to harness the evolutionary process to come up with a really great replacement for National Curriculum levels. In order to get one outstanding nozzle, Unilever generated 449 failed attempts. In order for us to arrive at an excellent way of assessing children, we might have to put up with many more failures than that because in order to evolve we would have to learn from our failures and we operate in a system where we are all very proud of our attempts to redesign assessment systems. We’re unlikely to acknowledge that what we’ve produced is a failure, let alone learn from it.

Evolution works because it’s a cumulative process. It depends on each generation having a ‘memory’ of previous generations. Sadly, pretty much everyone working on assessment is working alone and in the dark with no idea how well anyone else’s system is performing.

I have no idea whether this would work, but what I propose is that the DfE should collect examples of every replacement for levels that schools have generated and subject them to testing. Ideally the testing would randomised controlled trials (RCTs) to minimise the narrative thinking to which we so easily fall victim. Maybe – and I realise I’m on shaky ground here –  it would even be possible to use some sort of computer modelling? I don’t know how precisely this could be done, but I’m sure that if there were a will there would be a way.

One possible way to overcome the problem of needing a quantifiable definition of success to determine which assessment systems got the chance to ‘pass on their genes’ would be to use a comparative judgements system like the one used by Chris Wheadon’s No More Marking system. Here, all we would need is a bunch of relatively savvy PhD students (or, who knows, maybe even teachers?) who would simply compare two systems and state which one they felt was better. With enough comparison, we could arrive at a rank order which could then inform the next generation of systems. Have a look at this blog for a bit more information on how we might go about this.  I don’t have the technical know how to make this work, but am reasonably confident that it could be done relatively quickly and cheaply.