No, I’m not using evidence, but I’m not using prejudice either. I am exercising my professional judgement.
Sue Cowley
It doesn’t make a difference how beautiful your guess is. It doesn’t make a difference how smart you are, who made the guess, or what his name is. If it disagrees with experiment, it’s wrong.
Richard Feynman
A few days ago I wrote about why we shouldn’t credulously accept evidence, and that it wasn’t as simple as suggesting that teachers either use evidence or prejudice to inform their decision. We are all guilty of using prejudice whether or not we use evidence. I proposed that we should ask two questions when reviewing evidence: firstly, what evidence? And second, what is my prejudice?
One reader left this comment in favour of an anti-evidence stance: “I think maybe when Sue Cowley and others say they know that certain kinds of activity are beneficial, perhaps they know this … from long and deep experience?” This is a position I’ve addressed a number of times, but particularly here. The problem is that teaching may be a ‘wicked domain’ in which expert judgement doesn’t routinely develop as a result of ‘long and deep experience’. There are two main barriers to teacher improvement. One is that we often fail to notice whether our ability to help children make good progress is any good. The other is the way we are held to account. We are asked to justify and explain why students failed to make the grade; we are under pressure is to make excuses and conceal mistakes to avoid being blamed. Instead of admitting that what we’re doing doesn’t appear to be effective we shrug and say, “These things happen” and “What can you expect with kids like these?
Instead, if we want to improve as teachers we have to acknowledge our errors. If school leaders want this to happen they must create a culture where it’s both safe and normal. We should change the norm from using evidence to confirm our prejudices to using it explore how we might think and act differently.
After asserting her professional judgment in the comment thread, educationalist, Sue Cowley then wrote this blog. Exercising the principle of charity, I should make clear that there’s quite a lot we agree on. She points out – as I have done – that there are problems with evidence in education. Yes, of course. I’m not aware of anyone who would dispute this.
Then she argues – as I have done – that part of the problem is that we don’t agree what education is actually for. No matter how much empirical evidence we could come up with proving the effectiveness of rote learning, corporal punishment, circle time or group hugs, if it comes into conflict with your moral and ethical beliefs about the world you will ignore it. Again, I don’t think anyone disputes this.
From here, Sue then wants to claim that schools are not like labs and that research conducted in labs is therefore unhelpful. Well, it’s certainly true that classrooms are very different environments to psychology labs but – as I argued here – that doesn’t mean we should dismiss the findings of psychologists. Good science has the power to make useful predictions; if research can be used to inform our actions then it is useful. It’s unnecessary to accurately control and predict how every student in every context will behave or learn, just as a physicist has no need to control or predict how every single atom will behave in a physics experiment. All that’s necessary is that we can predict an outcome that is both meaningful and measurable. The insights of cognitive science, gleaned over more than a century and predicated on well designed, repeatable tests that build on prior research and which produce broadly consensual meaningful and measurable outcomes should not be dismissed as unlikely to work in the classroom. Of course it’s the case that a specific intervention may not be effective with a particular student, but that doesn’t mean that it will not be effective in a majority of cases.
She then suggests it’s ridiculous to assess the efficacy of an intervention such as Philosophy for Children on whether children’s SATs results increase. She would have a point if the claim underlying P4C wasn’t that it “aims to improve children’s reasoning, social skills, and overall academic performance.” If an increase in overall academic performance doesn’t show up in SATs results then we can reasonably conclude that this claim isn’t correct. Of course, the P4Cers can still say they improved children’s social skills as no one would expect that to show up in SATs. To test that claim – as I explained here – we’d need to design an assessment of social skills.
Her next sally is to suggest that trials are expensive and that the DfE is wasting money which could be spent elsewhere. We should be clear that the trials the DfE wants to fund are completely different to the laboratory trials Sue railed against a few paragraphs earlier. These are randomised controlled trials conducted in real classrooms with real teachers and real children. She’s right to say that the rhetoric around ‘closing the gap’ is misguided; the best we can probably do is seek to move the entire bell curve to the right. I also agree that there is good reason to be sceptical about the EEF. She also makes a fair point about the hypocrisy of the DfE rolling out new grammar schools in the face of all the very clear evidence that this is a bad idea. But does that means it’s a waste of money to fund trials into the efficacy of classroom interventions?
Sue points out that every intervention is likely to come with negative side effects. That’s true. But it’s true of interventions whether they’re research or not. It’s well established that feedback can have a powerfully negative impact but no one’s suggesting that we should never give children feedback. In fact, wouldn’t it be better that these side effects were better understood through well designed tests? As Sue says, “history tells us a story about the issues this has caused in the field of medicine.” It does indeed. Consider the woeful tale of Ignaz Semmelweis. His clear evidence that doctors washing their hands reduced infection rates was ignored by the learned profession who saw the idea as demeaning and pointlessly trivial. Doctors’ professional judgement cost lives. Ben Goldacre catalogues many other instances where medical trials contradicted professional judgement here.
Sue concludes by saying she’s not happy for her children to be guinea pigs in classroom trials. But weirdly, she’s “fine for my children’s class teachers to try out new approaches that they think will suit my child.” I may be missing something but this seems a very right-wing, capitalist approach to education. She seems to be saying that she’s fine for her children to be experimented on as long as test doesn’t involve any reputable protocols. Because that’s what happens when we ignore evidence: we just footle about with ‘what we reckon’ is a good idea without ever finding out if we’re wrong. And if we’re wrong we’re gambling with children’s life chances. We create a closed circle in which we put our vanity, our prejudices, and our misplaced sense of professional pride ahead of what’s best for children.
Other professions worthy of the name have set aside such naive notions of ‘professional judgement’ in favour of being critical consumers of research. If we really care about children’s life chances we should set great store by that.
Let’s give the last word to Douglas Carnine:
Until education becomes the kind of profession that reveres evidence, we should not be surprised to find its experts dispensing unproven methods, endlessly flitting from one fad to another. The greatest victims of these fads are the very students who are most at risk.
David – Comes close to what I wrote here: https://3starlearningexperiences.wordpress.com/2017/01/10/will-the-educational-sciences-ever-grow-up/
Thanks Paul – not intentionally I should make clear 🙂
Here is a very explicit example concerning Professor Peter Tymms of Durham University and his misunderstandings of measurement. https://paceni.wordpress.com/2017/02/22/peter-tymms-misunderstands-the-nature-of-measurement-in-psychology-and-education
Professor Tymms and colleagues have seen the paper but have chosen not to reply. Probably best since authorities in psychometrics have sided with the view presented
This is not a specific example of Davids point.. It is another unintelligible polemic by Dr Hugh Morrison. I don’t blame Prof Tymms for not replying, I doubt he can figure out what he is supposed to reply to.
An intellectual conspiracy theory, written in a style that makes postmodernist writings look clear and concise is not a good argument.
This will likely be my only reply if you respond in your usual manner.
When something is beyond your grasp it is best if you take the advice of philosopher and mathematician Ludwig Wittgenstein who suggested in Tractatus Logico-Philosophicus: “Wovon man nicht sprechen kann, darüber muss man schweigen.” English translation “About what one can not speak, one must remain silent.”
Try to adopt his advice.
Why not try my incredible new idea of Cricket*-4-schools? Pupils simply play cricket all day. This way they learn valuable interpersonal skills, operate as a team and have individual responsibility, learn to use maths in a ‘real world’ setting, appreciate winning and losing, develop good manners an how to speak appropriately to peers and adults, have reflection time while fielding, develop fine motor skills and visual awareness, develop an appreciation of history through a real world context by learning about cricket through the ages as well as developing job skills (i.e. being a cricketer) that they can apply in the real world. Cricket-4-schools. All day, every day. What is there to argue against here? Anyone who disagrees with this probably doesn’t even like children.
*other sports, hobbies or activities can be shoe horned in here without having to change much of the wording. Can I have some money now?
Ask Ben Goldacre to design and fund one of his government funded RCTs. If you need a fast bowler let me know.
if people were writing this sort of gobbledegook about medicine they wouldn’t be allowed to work in the profession. Not all medicines work for all patients – some of them have severe side effects – but if we don’t run serious, proper controlled trials then we are living in the dark ages of medicine! No one would dream of arguing otherwise – why is this any different for education. Ridiculous!!!
Some doctors do still prescribe alternative medicine under the guise of choice. I think this still does happen.
They shouldn’t be prescribing alternatives that have been often to be ineffective though – that is considered immoral as it dates money and can prevent people accessing those that have been proven – Goldacre makes this point very well repeatedly – there is no”alternative”, only that which has been tested and that which hasn’t – I write asa biochemist who went into teaching and practices “alternative” therapies as a side line!
In total agreement with all of this – but you’ll find that controlling variables in medicine is also enormously difficult, and Goldacre is in the business of trying to improve this incredibly inconsistent area for exactly the same reasons as we talk about in education. I qualified in biochemistry, am a teacher and also practice “alternative” therapies. There is no bad temper, but some impatience with a lack of understanding as to how research and evidence work (with both sides – I get just as impatient with those convinced by their “evidence” into ignoring all alternatives – both in education and in science!!!)
What alternative therapies do you practice? most of my knowledge about medicine comes via Goldacre as well. I believe just like education there is a mixture of approaches to evidence based practice in medicine, though the percentages of people in each camp may differ between the professions.
I know this example isn’t strictly involving doctors but I recently had a physiotherapy appointment and was prescribed four exercises.
One exercise involved a foam roller to stretch my achilles tendon. Having read a little about Myofascial release as I like sports, (namely that it was still poorly supported) I asked what evidence there was for this treatment in regards to lengthening my tendon. (It doesn’t seem to stretch it in a conventional manner and it won’t directly strengthen it). I expected a reference to a study or admission that it is not currently researched adequately.
I did not have a pleasant five minutes after that question. The physiotherapist reacted like I had questioned her entire practice. She insisted all her practice was evidence based. I tentatively tried to say that was unlikely as thats not really how it works. (You have to look at evidence for each individual aspect/treatment). I tried to use teaching as an example but all I managed to do was convince her that teachers are incompetent.
When I left I tried to do a search and as far as I could find (with my limited experience) was that there was no evidence that using this technique helped. Another of my treatments was very strongly supported. I ordered a roller anyway and will give it a go but It demonstrated to me that the issues around understanding evidence seem to apply in fields other then education.
Ha – why doesn’t that surprise me!
I took up Tai Chi when I got Chronic Fatigue some 20 years ago – western medicine was till at the stage of offering nothing for this. I tried a huge number of “alternatives” and found most of them pointless, but Tai Chi has been tested in a very rigorous and scientific manner in China – in that as a martial art, if it works you get stronger and beat more people, and if you don’t you take the hit! Much of what we have in the west is utter nonsense, but the style I practice is based entirely on biomechanics.
I have great issue with physios as they don’t practice on their own bodies – and the studies are done almost entirely over a 6 week period – many issues can be helped over a short time without apparent side effects, but the same thing done over a long time will be very damaging (taking mind altering drugs for depression for example – over a few weeks this makes the user feel great – but Im fairly sure that doesn’t mean it works long term!!!).
In medicine, as in education, the scientific community suffers from huge amounts of poor science – selecting of data to suit the hypothesis, ignoring of inconvenient data, not really considering or controlling variables properly, ignoring things that work because they don’t agree with a current paradigm…
If we could get your average user in both fields to understand and work with the science better, n a more structured way, it would have to improve the situation in both fields
The rest of the prescription was good, and the physio I had the week earlier was more evidence aware. When she found out that I barefoot run she asked me if I had reviewed the evidence. She was surprised when I said that when I last checked it was poorly supported having only a basic theory and some tentative and pretty poor studies to support it. She discussed some new research which I checked later which suggested obvious issues including damaging the back tendon through overuse.
But the trouble is that it’s incredibly difficult, practically impossible, to run a proper scientific trial in education because everything is so complicated, there are so many variables, conditions are never quite the same. Reading the ‘learning spy’ blog has made me question so many assumptions I had…BUT I think it’s also true to say that many discoveries in Science started with people observing, noticing things, having intuitions and beautiful theories! (And then the theories have to be tested…) You can’t just say that experience counts for nothing, nor can you say that just because there is no ‘scientific’ evidence for something (yet) then that thing is definitely worthless. It might well be, or the version that many people implement might be, but the art/craft of teaching as a whole is so complex you can’t just reduce all of it to ‘this is good, that is bad’, nor should we be really be having bad tempered arguments with each other (good humoured ones would be fine) when actually there are people like Ministers to save the bad tempers for, surely?
But that’s not what’s happening. There is a considerable body of evidence that indicates minimally guided approaches to instruction are less effective than explicit instruction for instance. If you then say, “But it works for me! I’m just going to ignore this evidence and do whatever the hell I like in the name of professional judgement!” Then you are, in my view, being unprofessional. The same applies if you want to advocate learning styles or a whole host of edu-quakery.
The debate is not about good vs bad, it’s more effective vs less effective.
Plus, does this seem bad tempered to you? It doesn’t to me. Perceptions of tone are *so* subjective it’s much better to just get over it.
Aren’t there some things which are neither direct instruction nor ‘minimally guided’? I thought the title of this blog post was maybe a bit unkind? But then I am very grumpy and weary after hearing this morning that SLT would be coming on learning walks ‘ to see the students making progress in their lessons ‘ and this evening having to persuade Nancy Blackett, Amazon Pirate and Terror of the Seas, that she did have to go to bed now…but then I read Sue Cowley’s unscientific but lovely post about World Book Day and it cheered me up. I like to read both of you, for balance 🙂
What’s neither explicit nor implicit?
Neither explicit nor implicit?
So if your form time activity is ‘ talk about the news ‘ and you get up a news website and someone asks about Trump repealing ‘Obamacare’ and you find yourself telling your form about health insurance in the U.S. and then you end up having a debate about the role of the state vs personal responsibility, that’s not direct instruction (it wasn’t carefully planned and sequenced, and (controversial!) it followed the learners ‘ interest and curiosity) but it isn’t ‘ minimally guided ‘, with students floundering about trying to find out something without knowing what they’re looking for (and there are lots of facts involved) ?
No, that’s a great example of explicit instruction 🙂
She’ll block you David!
Carnine’s quote at the end of Paul Kirchner’s paper reminded me of Ben Riley’s excellent blog during his time in New Zealand. Here’s a quote:” The Hipkins paper illustrates what I contend is a fundamental problem that plagues education systems today, whether in New Zealand, the US, or anywhere else as far as I know. Unless and until we reach consensus that (1) learning is primarily a cognitive function; (2) education-related decisions should aim primarily to improve cognitive function so as to facilitate better learning; and therefore (3) we should strive to use the best available evidence of how learning takes place when making education-related decisions – we are all but doomed to an endlessly repeating cycle of edu-philosophical navel gazing.”
http://kuranga.tumblr.com/
This and your blog David keep me sane!
This seems to resonate with Black Box Thinking by Matthew Dyed and the parraells he draws between education and medicine as opposed the aviation industry. Where failures are embraced, the reporting of them is encouraged within a tight system, they are unpicked by professionals and lessons are learnt to bring about future improvements. It seems that education could learn a lot from this approach and bring about some rapid progress based on sound evidence, rather than the black magic that currently pervades schools around QA data and lesson observation.
Interesting article, thank you.
Another issue is why does the ‘evidence’ of some of the so-called experts differ so much? For example, Hattie claims homework is not an effective strategy, whereas the EEF does, Hattie says visual representation of abstracts is not effective but Marzano says its a top 10 strategy. Here’s some more examples – http://visablelearning.blogspot.com.au/p/other-researchers.html
This is what we would expect evidence to look like. There is a massive variation in both interpretation and in how the evidence is gathered in the first place. Your implied conclusion that the ‘evidence’ of so called experts is no better then anyone else’s is a common one. It is also dangerously wrong.
Just like a plumber needs more sophisticated knowledge then sticking pipes together, researchers need to compile varying sources, weighing their relevance and opening their conclusions to scrutiny.
The resulting knowledge is always subject to error and must be constantly challenged and reviewed even when consensus has been reached. If we do this we can gain important and useful insights into some aspects of our practice. The fact that there seems to be more unanswered questions then answered ones doesn’t nullify this advantage.
When man invented fire we had an advantage, even though it had many severe limitations in its use.
Coal gave us power plants that produce electricity. The downsides of which we are still trying to resolve.
Renewable energy helps reduce our environmental impact but is limited in it’s capacity.
Nuclear theory produced potentially world ending weapons but can be used to produce life sustaining power.
In education an understanding that in general explicit teaching is more effective can help us teach more efficiently
Understanding the dangers of over supporting students can help us to improve their achievement more effectively.
Knowledge that phonics teaching improves outcomes for all readers is a powerful tool.
When looking at conflicting research try looking for points of overlap rather then disagreements.
(By the way the effectiveness of homework is sensitive to age, type of homework etc: this is an example of how conflicting conclusion’s are not necessarily conflicting evidence).
Replication is a major tenet of the scientific method so I would expect good evidence to be replicated, not disparite and contradictory as in Hattie, Marzano, et al. Finding the common gorund in each of these “experts” conclusions is not replication.
Your example, of homework shows that the strategy of representing an educational influence as one stat – usually effect size, is misleading and should be questioned.
The danger has been, and still is, leaders of Educational juristicitons presenting the findings of these experts as facts, without scrutiny or analysis.
The other danger is the use of this type of evidence, for political purposes, in Educational policy. An example, is the use of Hattie’s evidence to deny a more equitable distribution of educational resources for disadvantaged students in Australia.
Another example is Professor John O’Neill in analysing Hattie’s influence on New Zealand Education Policy:
“public policy discourse becomes problematic when the terms used are ambiguous, unclear or vague” (p1). The “discourse seeks to portray the public sector as ‘ineffective, unresponsive, sloppy, risk-averse and innovation-resistant’ yet at the same time it promotes celebration of public sector ‘heroes’ of reform and new kinds of public sector ‘excellence’. Relatedly, Mintrom (2000) has written persuasively in the American context, of the way in which ‘policy entrepreneurs’ position themselves politically to champion, shape and benefit from school reform discourses” (p2).
Science begins with scepticism, however, in the hierarchical leadership structures of Educational Institutions sceptical teachers are not valued, although ironically, the sceptical skills of questioning and analysis are valued in students. This paves the way for the many ‘snake oil’ remedies and the rise of policy entrepreneurs who ‘shape and benefit from school reform discourses’.
This is the real danger that few people talk about.
I love the Feynman quote, but education is as similar to physics as pupils are to particles. When someone conducts an experiment on a particle, they can reasonably assume that the same experiment conducted on another particle of the same type will have the same outcome. Someone cannot make the same assumption if they conduct an experiment on one class and want to apply the results, even to a very similar class. That’s not to say that we can’t learn from a well conducted experiment in education, but (like physics, in fact) what we learn from such an experiment is something theoretical. The raw outcome of an experiment does not tell us ‘what works’: at best, it tells us that ‘something worked’ and the chain of argument from that to ‘what will work in other circumstances for other people’ is much longer than the chain of argument in physics. Knowing ‘something worked’, in and of itself, has no real value. But each experiment is conducted in a theoretical context: the researcher starts from the belief that some intervention ought to work because some mechanism will be activated. That mechanism may be more active in some pupils and in some circumstances than for other pupils or in other circumstances and a full programme evaluation should be aimed at drawing out a theory of the form “what works, for whom, in what circumstances” and why. When we have that, teachers can use that level of evidence to decide whether or not those circumstances apply in their classroom, whether their pupils are those for whom such an intervention might work.
Unfortunately that’s not what is currently happening in much educational policy making: either people using ‘professional judgement’ may or may not have some understanding of mechanisms but very often – even if they are able to articulate those mechanisms – they are not tested or testable understandings of how pupils learn. Others look to experimental research but incorrectly extrapolate from ‘it worked here’ to ‘it will work everywhere’ or, worse, argue that intervention X is more effective that intervention Y because on experiment (or set of experiments) has a higher ‘effect size’ without accounting for having different tests, different samples, different comparison groups and so on. Without checking these basic assumptions, one cannot even conclude that particular intervention X was more effective than particular intervention Y, let alone that interventions of type X will be better than interventions of type Y. No-one seems to bother to check these (and if they did, they would very, very rarely be met). You may be right in saying that “good science has the power to make predictions” but at the moment the ‘science’ in education is rather bad; and good science doesn’t just predict that something will happen, but also explains why it should happen by positing mechanisms which underlie causes and effects.
Few people seem focussed on articulating mechanisms by which an educational intervention might work and then evaluating trials of that intervention across different circumstances and different pupils to see if they can improve our understanding of that mechanism in such a way that a teacher can reasonably think about how the intervention might work on their pupils in their circumstances.
I have this concern that people get fixated by ‘evidence’ without asking what it might be evidence for. For example, ‘professional judgement’ is a form of evidence, but it may be mainly evidence of the beliefs and biases of the ‘professional’ involved rather than evidence of things which may have a positive impact on learning. Similarly, ’effect size’ is also not evidence for the educational impact of an intervention. At best, the ‘effect size’ from a randomised controlled trial is evidence for the impact of an intervention on an underlying mechanism, compared to the impact of the comparison group’s activity on that mechanism, measured on some particular population with some set of particular circumstances and evaluated with test (which may be a better or worse test of that underlying mechanism). We can’t talk about being more or less effective (even in rough and general terms) by comparing trials with different populations, comparison group activities, tests etc.
This is not to say that we can simply rely on someone’s ‘professional judgement’, but trying to combine trial outcomes (especially those which help determine underlying mechanisms and the circumstances in which interventions impact on them) with professional judgements, smaller scale qualitative research, big data analyses etc. to develop practical theory (in the sense of underlying mechanisms) will be much more practical than either ‘going with our gut’ or abrogating our professional responsibility to a poorly understood statistic.