Why teacher assessment is less fair than standardised testing

~~Tests~~ Guns don’t kill people, rappers do
Goldie Lookin Chain

I spent the day yesterday at the Department for Education thinking about how best to cut down on the “unnecessary workload” associated with marking. Today I spent far too much time bandying words with children’s writer, Michael Rosen about the value of testing over teacher assessment. It strikes me that both experiences offer an opportunity to set out my objections to teacher assessment and my support for standardised testing.

Let’s start with teacher assessment. My first concern is that any expectation on teachers to assess students’ work adds to their workload. If we’re going to ask teachers to work harder we ought to be pretty sure that the additional work we’re asking them to do is worthwhile. So, is it? Well, contrary to many people’s intuitive beliefs, teacher assessment is both less reliable and more unfair than standardised testing. This is difficult for some teachers to hear, but as Daisy Christodoulou puts it, “Teacher assessment is biased not because it is carried out by teachers, but because it is carried out by humans.”

There’s all kinds of evidence that humans are subject to predictable and unconscious bias. Firstly there’s the research into psychological concepts into heuristics and biases like the halo effect, confirmation bias, the anchoring effect, overconfidence bias, and many more. In Thinking, Fast and Slow, Daniel Kahneman relates how the halo effect led him to systematically mis-grade students’ essays. Quite reasonably, if a students’ first essay was awarded a high score, mistakes in later essays were ignored or excused. But Kahneman noticed a problem:

If a student had written two essays, one strong one weak, I would end up with a different final grade depending on which essay I read first. I had told students that the two essays had equal weight but this was not true: the first one had a much great impact on the final grade than the second. (p. 83)

In studies where teachers were told that a student had a learning disability, they rated that student’s performance as weaker than did other teachers who were told nothing at all about the student before the assessment began. There’s also evidence to suggest teachers are unconsciously biased against children from ethnic minorities. And here’s another study which investigates the bias against children due to race, gender and ability. Conversely, as Suskind & Rasmussen show, we also routinely assume “well-behaved students are also bright, diligent, and engaged.”

Daisy Christodoulou has unearthed further evidence here:

Both high and medium weight evidence indicated the following: there is bias in teachers’ assessment (TA) relating to student characteristics, including behaviour (for young children), gender and special educational needs; overall academic achievement and verbal ability may influence judgement when assessing specific skills. (Harlen, 2004)

Studies of the National Curriculum Assessment (NCA) for students aged 6 and 7 in England and Wales in the early 1990s, found considerable error and evidence of bias in relation to different groups of students (Shorrocks et al., 1993; Thomas et al., 1998). (Ibid)

It is argued that pupils are subjected to too many written tests, and that some should be replaced by teacher assessments… The results here suggest that might be severely detrimental to the recorded achievements of children from poor families, and for children from some ethnic minorities. (Burgess and Greaves, 2009)

Teachers tended to perceive low-income children as less able than their higher income peers with equivalent scores on cognitive assessments. (Campbell 2015)

Understandably, you might not want to be bothered wading through all that lot. Instead, why not simply watch this video of Robert Coe explaining why teacher assessment is so problematic?

Frustratingly, teaching is not set up in such a way as to produce the conditions needed for reliable judgement to develop. Of course, we all believe we’re immune from these biases which affect everyone else. There’s a name for that too: the bias blindspot. In one study, only one out of 661 survey respondents admitted to being more biased than the average person! Claiming, “It works for me!” is just further evidence of bias.

All this suggests that adding to teachers’ workload by making them assess students’ work seems a bit of waste of time, especially when we can rely on standardised tests instead.

But what’s that you say? Standardised tests are evil? I beg to differ. Although there are certainly problems associated with testing these problems are not a function of their being standardised. Standardisation just means that we are better able to compare outcomes because we can be more certain of their reliability.

The problem isn’t tests themselves it’s the purpose to which they’re put. When tests are high-stakes we create all sorts of accountability problems which often result in schools and teachers pursuing perverse incentives. Not only that, high-stakes can result in crippling anxiety for children. We’re right to worry about the consequences of stress caused by exam pressures, but these pressure should only ever be transferred to students when they mean something tangible. A levels and GCSEs are qualifications which, whether you agree with them or not, have meaning. KS2 SATs are not. There’s nothing wrong with the government deciding to test children in KS1 and 2 to determine their attainment, but we must remember that while these tests assess students they are used to hold schools accountable. I really get that these results matter to schools, but they are (or should be) irrelevant to pupils.

What we really ought to object to is the way schools feel compelled to pass on these anxieties to children allow testing to warp the curriculum. Rather than attacking testing our time might be better spent attacking the foolish, ill-thought out accountability measures which create these incentives.

David Didau2021-08-10T12:07:55+01:00November 4th, 2015|assessment|

31 Comments

Paul November 4, 2015 at 11:33 pm - Reply

Was so prepared to shoot loudly at this article, but you are right, the problem is not testing (although unfair tests are an issue) but the use made of the test results. How we alter a government’s mindset about the amount of information we can extract from limited data is a difficult task.
- David Didau November 5, 2015 at 8:54 am - Reply
  
  It is indeed. But I’m of the belief that trying to do so sensibly from the inside will be more effective than lobbing grenades over the walls of Sanctuary House 🙂
joiningthedebate November 4, 2015 at 11:50 pm - Reply

Most maths teachers rejoiced when coursework disappeared. It was too subjective and vague. The idea that there exists a child who was good at coursework but poor at exams is perhaps a red herring on the whole. There was a general correlation (I believe) between exam marks and quality of coursework. As teachers we can be our own worst enemy…for example I would happily use a state recommended text book (and add to it if necessary with my own personality etc) unlike teachers who believe it is their right to decide what exercises to set. Very time consuming trawling for stuff and creating stuff these days. Governments since the 90s should have invested in producing text books rather than glossy ring binders for NC which were not much use. All this is perhaps easier for a maths teacher to say by mature of our subject.
https://joiningthedebate.wordpress.com/2015/10/28/my-assessment-policy/
Andy November 4, 2015 at 11:50 pm - Reply

This argument appears to be contradictory, David. Compare “my support for standardised testing” and “When tests are high-stakes we create all sorts of accountability problems”. My question is: when is a standardised test not a high stakes test?
The quote “Teacher assessment is biased not because it is carried out by teachers, but because it is carried out by humans” seems a complete red herring. Teachers assess all day long. It’s the nature of the job, surely.
- David Didau November 5, 2015 at 8:39 am - Reply
  
  1) You appear to be under the impression that “standisation” and “high-stakes” are in some way synonymous. The fact that test scores are more fairly comparable in no way creates de facto high stakes. Likewise, the fact that scores are not standardised doesn’t lower the stakes.
  2) Teachers do make judgements all day every day and this does, of course, lead to biased decisions and unfair outcomes for more vulnerable groups of children. What mitigates this is that the stakes for most assessments undertaken in the classroom by individuals are low because most are not reported in any way. But raising the stakes for these judgments, as is the case for teachers assessments, creates all sorts of unwanted probabilities.
  - Abena November 15, 2015 at 6:14 am - Reply
    
    David, would you mind just clarifying what you mean by ‘teacher assessment’? From the response (2) above, I take it you mean ‘teacher assessment that counts externally’ rather than the formative assessments we undertake daily in our classrooms. I am not in the UK at the moment, so unsure if this is a reaction to a particular new initiative. I’d love to be enlightened.
    Also, are you applying this to GCSE coursework too? That would imply you are a fan of a ‘1 paper to assess the 2-year course’ approach. Is that what you believe? I’m genuinely interested.
    Thanks.
    - David Didau November 15, 2015 at 10:43 am - Reply
      
      Teacher assessment takes place whenever a teacher assesses students’ work. This is *always* subject to unconscious bias but when the stakes are low it doesn’t matter that much. When the stakes are raised, as in the case of externally moderated assessment, bias (unconscious and otherwise) skews results in favour of the most privileged and advantaged. I’m not necessarily “a fan of a ‘1 paper to assess the 2-year course’ approach” as it leads to questions of validity, but controlled assessment is so unreliable as to be largely meaningless.
Michael Rosen November 4, 2015 at 11:56 pm - Reply

So, to recap: what is taking place in, say, a local authority comp for year 9 and year 10 students. The tailback from the high stakes exam of GCSE (an exam that is increasingly becoming superfluous) has resulted in my daughter being tested about once a week. So, this is a school marked as ‘outstanding’ by Ofsted but of course one tiny slip or two and it could end up end up being forcibly converted.
It’s pretty clear from Parents’ Evenings that some of the teachers’ anxiety is directed to pupils who are being predicted As but there is the possibility of A*. This is re-run of the anxiety that was directed towards my daughter at the time of KS2 SATs. She had been given one level but because it was a high level (I’ve forgotten the pesky numbers), she was told that the school wanted to put her in for the level appropriate for Yr 7s. As a result she had a miserable month doing test papers at home and at school because if she did get this higher level it would be good for the school…not for her…but for the school. At that very moment, she had an idea for a novel. She wrote about four or five chapters but under pressure of the SATs, she ran out of steam on the novel. She never picked it up again.
Now, there’s something similar going on where the English teacher says, ‘Don’t forget to read round ‘Jekyll and Hyde’ – read some Gothic novels..’ What?! When?! The combination of homework and permanent test revision means there isn’t time to read a big Victorian novel!
To be clear, I do not blame the teachers for any of this. They are locked into a test-crazy regime.
- David Didau November 5, 2015 at 8:45 am - Reply
  
  Michael, what you’re describing are the lamentable and perverse incentives created by a high-stakes accountability system. Although I have no information about your daughter’s school I would say that if the situation really is as you describe it, it is anything but an “outstanding” institution. Were my daughters’ school behaving similarly I would in the first instance complain, and, if that was unproductive, move them to a school less run on the principles of fear.
  In contrast, my experience of the new GCSE in a subject like English is that teachers are relieved that the burden of controlled assessment (high-stakes teacher assessment) has been removed and that terminal exams mean that they can teach with much more freedom and have much more scope to instill a love of the subject than has been possible in recent years. As I’m sure you know, reading a Victorian novel is something that for the first time in years is happening in schools and it’s all because owe have removed the burden of teacher assessment.
  - Michael Rosen November 5, 2015 at 8:58 pm - Reply
    
    I’m not going to engage in a one-man dispute with a school over the amount of pre-testing testing they do. It’s across the board, part of the fabric of the place. I suspect it’s a result of the situation LA schools are in: unless they’re ‘outstanding’ they face forced conversion. So they apply the principle (unproven) the more practice tests you do, the better the final result. I don’t see any ‘freedom’ here. As for the Victorian novel. I don’t belong to the school of thought that a ‘Victorian novel is better than any other kind of novel’ or that there is some special virtue attached to it, or any real purpose in reading it rather than say a translation of a great French, German, Italian or Spanish novel or indeed a twentieth century novel, a US novel etc.
    - David Didau November 5, 2015 at 9:15 pm - Reply
      
      Of course your daughter’s education is none of my business but were I in your position I would move heaven and earth to do something about it. I visit about 3 secondary schools a week and few of them behave in the way you describe: it really isn’t either normal or acceptable.
      If you’re in any doubt about how English teachers view the removal of controlled assessment, just ask on Twitter. Less teacher assessment categorically does mean a less constrained curriculum and, by extension, greater freedom to teach what and how you want.
      As for the Victorian novel, I only mentioned it because you brought it up.
      - Michael Rosen November 6, 2015 at 7:56 pm
        
        Well, the rationale is a) the students need practice doing timed questions b) the Toby Young justification – how else would we know if the students have learned what they’ve been taught? c) motivation to do better.
        re an AfL justification: I asked my daughter if they get the papers back and discuss with the teacher where things aren’t right, with a view to getting more teaching in those areas. She said that that is generally the case in relation to maths and science but not in other subjects.
        (I know this sounds like crazy dad-boasting but…I sense that sometimes the pressure on students predicted to get A* are put through the ringer the most. I sense from parents evenings that the teachers are desperate to up their A* count. Same applied at the time of KS2 SATs where she had to do loads and loads of extra work so that she ‘could’ (we didn’t want her to) do a ‘higher’ SAT.)
Leon Cych November 5, 2015 at 12:11 am - Reply

So how do you propose we decouple these then? That would be the next question then?
- David Didau November 5, 2015 at 8:47 am - Reply
  
  That, Leon, is the million dollar question. The answer is one that I’m trying currently to peice together but it lies in the realm of more intelligent accountability: https://www.learningspy.co.uk/featured/intelligent-accountability/
Nick November 5, 2015 at 5:50 am - Reply

To add to what you have described above, the following is from Daniel Kahneman as well from when he was assessing leadership for the Israeli army… he did win a Nobel. This debate is of particular interest to me because in my jurisdiction (Alberta, Canada) we seem to be moving towards more “competences” (like leadership) at the same time as more weight is being placed upon teacher assessment.
“We would watch them in action, they were in groups of eight, and the stunning thing is when you watch a thing like that, you see personalities, it’s just very, very convincing, and so we saw them in action. There’s a feeling you get a real direct access to how good the person is, as a leader, each one of them.
Some leaders and some wimps, you see all characters. But then about once a month, we would have the Friday of statistics on the last day of the week, and they would come and tell us how well we were doing, in predicting success in officer training. And the fact was consistently, we were not doing anything. I mean, we were just guessing. What we were seeing had nothing to do with reality. But the striking thing, and which influenced my work a lot, is that even after you’re told statistically that it’s nothing, that you can’t do this, you cannot predict how well they’ll do. The next Sunday, there is a batch of people, you take them to the obstacle course, you watch them, and you see them just as clearly as ever. So this disconnect between knowing that there is something you can’t do, and feeling that you personally can do it, that’s been with me all my life, and I think there is a lot of it on Wall Street.”
http://www.forbes.com/sites/steveforbes/2013/01/24/nobel-prize-winner-daniel-kahneman-lessons-from-hitlers-ss-and-the-danger-in-trusting-your-gut/
…and when asked why these sorts of biases are so stubborn- and I hope Mr. Rosen is reading this- he said that:
“It persists because you get the immediate feeling that you understand something. That is much more compelling than the knowledge of statistics that tell you that you don’t know anything. And, that again, is the officer story. But it’s writ large, that you really see it at work in many domains. In the financial domain, where people feel that they can do things that, in fact, we know and they should know they can’t do”
It seems to me that TA, in assessing future performance in these sorts of things, might be better done with a random number generator. If we are talking about identifying common misconceptions about some set material then I can see the value in it’s immediacy. But I find it hard to believe that someone who may have a few days worth of one-on-one time with my child over the course of a year (at best) can judge their future “leadership” or “collaboration” abilities and assign a grade that should have more weight than a standardized assessment.
Unfortunately, Daniel Kahneman’s work is one of those things that one cannot un-read… would that I could. When my child is assessed- in English- on whether he “Represents ideas and creates understanding through a variety of media” and, upon asking how this is judged, am told that “there is a rubric that is used with professional evaluation” I honestly wish I had never picked up Kahneman’s damn book in the first place.
Thanks to people like you, Daisy, Greg and Andrew for continuing to draw back the curtain on some of the assumptions that lead to this sort of magical thinking.
Fiona Lingard November 5, 2015 at 8:30 am - Reply

re Year 2/KS1 assessment : currently, writing standards are teacher assessed – i.e a range of writing samples will be taken and a teacher will assess a standard (or level in old money) for that child. Not an exact science but surely better than judging a child’s ability on one day, one piece. Are we also to presume that if writing standards are to be assessed externally, that these will no longer be done by humans? Just how do we assess writing standards any other way? Or will it be purely judged on SPAG? Marvellous.
- David Didau November 5, 2015 at 8:53 am - Reply
  
  Hi Fiona – the ways in which tests are made valid is through domain sampling. Psychometricians establish what they believe to be the domain of the subject being tested and select test items which allow us to discriminate between test scores in a meaningful way. They deliberately avoid items being too hard (no one passes), or too easy (everyone passes). The time, expertise and deliberation that goes into designing a test can never be matched by a teacher in a classroom.
  In answer to your question on the assessment of writing standards I agree that assessing only SPaG would be a grave error because the domain of writing would have been reduced to that of accuracy. We all agree that writing is much more than that and so any meaningful assessment will need to designed to sample the skills we do value.
Bill Lord November 5, 2015 at 9:16 am - Reply

As a Primary Head, I find it hard to argue the case for teacher assessment when so many of us are using inflation of grades at KS1 as a reason for progress issues between KS1 and KS2. LAs were put under pressure 4 to 6 years ago to get KS1 scores up so this filtered down to schools.
At the same time, you only need to look at the jump in L4+ levels in writing in Primary Year 6 results after the marking went from external to internal with some moderation.
When Ministers are giving statements talking about removing Head teachers for poor results, accuracy in teacher assessment is always going to be at risk.
Ephemeral321 November 5, 2015 at 12:23 pm - Reply

Human (teacher) bias and how to deal with it. Humans cannot create an education system that is free from ideosyncracies and bias. What we can do is remember systems won’t be perfect and review them.
So, how best to minimise bias in the system for the good of its students and public accountability?
Each part of a system needs to be understood but not in isolation. Assessments and tests exist to measure student learning and the effectiveness of the education system itself, so both the test and the conditions in which students study and are prepared to sit tests matter. As you note regarding teaching assessments:
“Frustratingly, teaching is not set up in such a way as to produce the conditions needed for reliable judgement to develop” on “race, gender and ability”.
Even shifting from teaching assessments to tests this remains important with regard to the bias students encounter during their studies and preparation for the tests.
In a US gender study on maths [which I can’t locate – anyone?] males and females together sat two tests. In the second test the invigilator intentionally states the test favours males scoring better (it didn’t). Results of the second test seem to reflect females internalised the bias and performed worse than males.
While the working group looks to minimise bias in assessment/testing it is equally important that children are taught daily with the minimum of bias in order to receive the optimum opportunity to do well in those tests – can’t test what isn’t taught if your ability is missed (in primary there is no mechanism for unbiased evaluation of a child’s minimum ability; IQ tests occur in Y7).
All systems should allow teachers, parents, analysers to spot anomalies, report them, and act on them; triangulation is important as are human relationships with the child. Staff need to properly trained and supported to the desired level of competence.
While national comparative standardised testing will be a better comparison than within a single school year itself, it still won’t show if an absolute standard of learning has been met.
I note some of the reasoning for testing vs assessment is teacher workload. I’m still not convinced for a good teacher who wants to understand their pupils thinking it will. To achieve workload reduction I would hope that working group is looking at well-researched textbooks combined with teacher knowledge to reduce teachers constantly searching for class materials.
It remains a concern that teachers continue to point to the system, the tests, the government, parents, and everywhere else to explain why children in their care are anxious. The professional creates the desired environment for the child. If the senior leadership team are spreading panic I would suggest seeking out the teacher representative who sits on the school’s Governing Body. Raise it with them and ask the Governors to look at it (in a school with a healthy accountability system this will be possible). Governors have a responsibility to ensure the school culture is healthy for its students and staff. If you have a lot of anxious staff and/or pupils I would suggest there is a problem with the management of the school and its culture.
How humility might help us learn | David Didau: The Learning Spy November 5, 2015 at 12:59 pm - Reply

[…] my last post I challenged the widely-held belief that teachers’ judgements are generally sound and […]
Glen Gilchrist (@mrgpg) November 6, 2015 at 8:44 am - Reply

Couldn’t agree more – “it’s not the tests, it’s what we (or the politicians) do with the results” that is the heart of the issue.
We take a test (GCSE say) which is a measure of the attainment of an individual at a point in time, which is supposed to represent the summation of the accumulation of a (prescribed) body of knowledge and the education system then uses the performance of an individual in comparison to a year group, to a cluster and across the national cohort. The measure morphs from data linked to an individual to data that somehow represents the performance of a teacher, school or wider education policy.
Then somehow, we view schools as “education factories” where the concept that annual improvement actually makes sense – and in some way these data points associated with the attainment of an individual have become metrics which we expect to see annual increases – just like expecting better sales, more TVs to be made and higher bonuses to be paid to bankers. Grade inflation, success inflation.
Then at the highest level politicians re purpose this learner level data and use it to substantiate (or refute) the success of their education policies, and we all get sucked down the rabbit hole.
We make teacher assessment high stakes (for the schools and teachers), so the veracity / rigour of the outcomes is under constant question, the results show annual increases and politicians can claim that this reflects the success of their latest initiative.
We constantly value what we measure as opposed to measuring what we actually value for our learners.
How to solve to this?
We’ve being tinkering with this literally since education became a formalised system – but in the first instance, I’d decouple Education (with a capital E) from the politicians – much like the Bank of England. Sure, let them set targets, but establish a learned body of educational professionals to actually drive the changes.
Oh, and on a final note – let’s reclaim the lower grades at GCSE – nationally and especially in the media and from politicians, we seem to have forgotten that for many learners achieving a D, E, F, G grade isn’t always a sign of failure, but either (a) the culmination of years of hard work or, unpopular as it may be (b) a grade that actually reflects the attainment and preparation that a learner input. For many learners an E grade is definitely a “Good GCSE”.
David, as always a thorny problem laid bare.
- Michael Rosen November 6, 2015 at 2:30 pm - Reply
  
  And you can add the fact that because the results become decoupled from the students, it’s possible, where there is a high turnover of students (e.g. where there is migration, refugees etc) for a substantial percentage of the cohort not being the same cohort as entered the system. When confronted by this, Ofsted inspectors have been known to say that they don’t care.
  - chestnut November 7, 2015 at 10:53 pm - Reply
    
    Students in year 10 and 11 at our school are frequently preparing for a controlled assessment as all subjects but Maths and RE have controlled assessment. High stakes tests all through the year for two years not practice exams – real ones.
    GCSE exams at the end of two years and only then – bring it on.
    Of course add to that teachers paid by results and it becomes even more high stakes.
Elysa Alton November 11, 2015 at 9:42 pm - Reply

Interesting point, we need to turn the telescope back in the right direction, then we might see the bigger picture.
How can we tell if students are making progress? | David Didau: The Learning Spy November 15, 2015 at 9:07 pm - Reply

[…] to distinguish between a C and D we run into very predictable difficulties. As I discussed here, the human brain is just not very good at distinguishing between these kinds of […]
When is it worth arguing about bad ideas? | David Didau: The Learning Spy December 7, 2015 at 3:53 pm - Reply

[…] Standardised tests are better than teacher assessments […]
julietgreen December 9, 2015 at 9:12 pm - Reply

Of course I totally agree. I’m outspoken in my contempt for the promotion of ‘teacher assessment’ as a magic panacea. Any argument which hinges on the high-stakes nature of standardised tests fails miserably because one of the main reasons why teacher assessment is iniquitous, is its use for high-stakes purposes as I try to point out here:
https://julietgreen.wordpress.com/2015/08/28/the-nonsense-of-teacher-assessment-an-analogy/
and why I am so frustrated with the ASE in their failure to grasp this as exemplified in their unwieldy and unlikely assessment system here:
https://julietgreen.wordpress.com/2015/08/21/primary-science-assessment-no-miracles-here/
The value of testing – on the back of a postage stamp | David Didau: The Learning Spy March 7, 2016 at 1:15 pm - Reply

[…] and testing has acquired something of a terrible reputation amongst teachers. But as I explained here, it’s not tests that cause anxiety but the stakes attached to the results of the test. If the […]
Testing, testing… | David Didau: The Learning Spy May 17, 2016 at 4:44 pm - Reply

[…] this is, of course, true. I wrote here that it’s not testing, but the stakes which cause stress. Another problem is that a test is […]
Why I’m optimistic about the new Chief Inspector | David Didau: The Learning Spy June 14, 2016 at 2:17 pm - Reply

[…] fight less testing, partly because this isn’t the job of the CHMI, but also because testing is much the fairest way to assess students. I’m sure she has some views on better testing that she might be prepared to share. As to […]
Fun is being “killed off”! Really? | David Didau: The Learning Spy September 6, 2016 at 9:01 pm - Reply

[…] the fact that testing – call it quizzing if you’re squeamish – is not only fairer than any form of teacher assessment, it’s also a hugely useful and astonishingly well-researched pedagogical tool which staves […]