To get anywhere, or even live a long time, a man has to guess, and guess right, over and over again, without enough data for a logical answer.
Robert A. Heinlein
I’ve been thinking hard about the nature of education research and I’m worried that it might be broken. If I develop a theory but have no evidence for it then it is dismissed as ‘mere speculation’. “Show me the evidence!” comes the crowded shout, and currently in the sphere of education evidence is all. But can we really trust the evidence we’re offered?
Clearly, sometimes we can. I don’t want to be cast as dismissing all evidence. My point is that we place too much faith in it, and we might possibly be mistaken to do so. Especially in education. As Bertrand Russell pointed out, “The most savage controversies are those about matters as to which there is no good evidence either way. Persecution is used in theology, not in arithmetic.” So too, I contend, in education.
The following is an attempt to untangle Egan’s objections to the bloated claims of education research made in his fascinating but frustratingly self-indulgent tome, Getting it Wrong from the Beginning.
Let’s imagine we want conduct some research on the effectiveness of a new teaching strategy (Strategy X.) How would we go about it? Well, we’d probably want to test its effectiveness across a range of different groups pupils and we’d probably consider getting several different teachers to try it out. We’d also want to have a control group who didn’t get the intervention so that we could try to establish what sorts of things happen without the intervention of Strategy X. A particularly reputable research might also want to set up a double-blind to try to avoid such confounds as the Hawthorne Effect, but it’s pretty tricky to keep teachers in the dark about how they’re teaching their pupils so in practice this is something that very rarely happens. We’d then need to decide our success criteria – how will we know if Strategy X works? For that we need something to measure, but what? Test results maybe?
Ok, so we’ve set up our study and, guess what? It turns out Strategy X is effective! It works! The overwhelming majority of studies show successful implementation of ideas, frameworks, teaching materials, methods, technological innovations etc. It doesn’t seen to matter whether studies are well-funded, small-scale or synthesising analyses of other studies: almost everything studied by education researchers seems effective. Of course there are some studies which report failures, but they’re rare. We acres of information on how to improve pupils’ learning that it seems inconceivable that learning does not then improve. But almost every one of these successful studies has absolutely no impact on system wide improvement. Why is this?
I’m sure readers will be able to point me in the direction of hugely important international studies which conclusively prove all sorts of things and that clearly we are on the brink of major breakthroughs. This, as far as I can see, has been the case for decades, but again, I’m the studies cited will be claimed to be free of the misunderstandings and technological limitations which bedevilled previous studies. But where does it end? And, most importantly, when will start reaping the rewards?
One of the problems we have is the limitations the scientific method in telling us what’s effective in education. Biesta, who I’ve been critical of before, tells us education is so saturated with values, and so contested in its aims, that it cannot really be operated on in the same way as physical sciences. We pay lip service to this fact but still make the mistake of believing that learning is a part the natural world and therefore conforms to the same rules that govern the rest of nature. I’m not sure that’s true. Rather, learning is shaped by a combination of evolution, culture, history, technology and development, and as such it’s a slippery devil; scientific method has to be appropriate to whatever it’s being applied. Methods may not transferable between fields of study.
Another issue is that our methodology is not always properly to the problems we want to solve, resulting in as Wittgenstein suggested of a clash of “experimental methods and conceptual confusion.” As has been pointed out to me before, if the most reliable of empirical evidence suggested beating children was the most effective way to get them to learn we would reject this finding as being both unpalatable and at odds with our values. Likewise if a progressively aligned academic found rote drilling to be more effective than discovery methods they would find it straightforward to dismiss the finding as being too narrowly defined or harmful in some other, less empirical way. This is something we all do. Any evidence, no matter how robust, has to align with our ideologies and values otherwise it is useless.
And that’s a third problem: empirical evidence in education isn’t empirical in the right way. As Wittgenstein observed, “The existence of the experimental method makes us think we have the means of solving the problems which trouble us; through problems and methods pass one another by.” Education research is founded on the proposition that it’s possible to establish causal links between discrete things, such as the link between Strategy X and pupils’ test results. But can it? That depends on the degree of conceptual confusion. Let’s say I want ed to conduct an experiment to determine how many students in Birmingham schools were under the age of 20. I do all kinds of data analysis and design as many questionnaires as I pleased, whatever I found would be banal as the causal connection I’m seeking to establish already exists as a conceptual connection. All I need to know is school education in Birmingham ends at age 19 to work out that the existence of 20-year-old students is a logical impossibility. This is an obviously stupid example, but it would appear this is exactly the mistake made in much education research, it’s just that the pre-existing conceptual connection is more subtle and the findings are psuedoempirical. Egan offers this example of research study attempting to establish how we should to teach by using such principles as “To develop competence in an area of inquiry, students must a) have a deep foundational knowledge of factual knowledge, b) understand facts and ideas in the context of a conceptual framework, and c) organize knowledge in ways that facilitate retrieval and application”. He points out that a) b) and c) are definitions of ‘competence in an area of inquiry’. No amount of empirical research could ever demonstrate that these things are not connected!
Added to all this we have the research finding (O! The irony) that less that 1% of the education research that gets published are replication studies. (A replication study is one where researchers attempt replicate results with different test subjects.) Now, apparently the majority of replication studies in education (68%) manage to replicate the original findings, but when replication studies are conducted by completely different teams of researchers only 54% of studies are found to be replicable. A cynic might suggest that there’s a degree of vest interest at work here.
This might suggest that instead of relying so enthusiastically on evidence we could instead put a little more faith in reasoning and analysis. If I present a reasoned analysis of why I think Strategy X is likely to be effective with no supporting data, it’ll be dismissed as ‘mere speculation’. But my contention is this: I could conduct research on something that is analytically sound, and ensure it cannot fail but to produce favourable evidence. Yes, there will be all sorts of variation between different groups of students and their teachers, but where a teacher is enthusiastic, research will likely provide favourable finding. This seems obvious. If I can convince a teacher of the merits of Strategy X, they’ll work hard to get me the positive data I’m after with no connivance needed. Similarly, if they were sure I was a charlatan, there’s no way they’d use Strategy X unless they were forced and in that case the likelihood research finding would be positive is remote in the extreme.
Maybe, rather than being so quick to say, ‘the research shows…’ we might be better to formulate our thinking with ‘analysis has concluded…’? Of course we would still have to contend with just as much nonsense and dogma, but we’d waste a lot less cash!
Yes. I’d go with all of that.
I think their is a bigger problem. Our entire system is test driven. As a result if you put a group of UK teachers in a room with a test, they can figure a way around it in less than 30 minutes. Test evidence can show the effectiveness of spoonfeeding or teaching to the test. Because test data is so important in our system(it can cost senior leaders their jobs ultimately) I don’t believe they really want very scientific or rigorous test conditions. They want instead proof they are doing a good job so they create the conditions which make them look good. Unfortunately none of these problems are easy to solve because it comes from a systematic problem, beginning with league tables and the inspection model.
[…] Read more on The Learning Spy… […]
Interesting stuff. However, I think it is too tempting to treat all research evidence as equally flawed. Not all of the research evidence that we apply to education is in the form of weakly controlled trials. Goldacre’s much maligned RCTs have something to offer, particularly where it is not obvious at to which group is the experimental group or where computers are used to deliver the instruction. Similarly, much lab-based psychological research has implications for education but is more rigorous. Take, for instance, Sweller’s worked examples studies. The participants would have had no idea which was the experimental condition and which was the control. This variation between study designs is one of the reasons why I find Hattie’s effect sizes to be such a blunt instrument.
Thanks Harry
I take the point about lab based work and discussed the power of this to enable reasonable predictions to be made here: https://www.learningspy.co.uk/myths/good-research-trumps-intuition/
But as to whether RCTs, well-designed or not, have much to offer I am sceptical.
Good post. A fundamental problem with education research is that the process we are attempting to observe and evaluate is not a single discipline with a widely accepted methodology.
Education research is a blend of sociology, anthropology, ethnography, philosophy with artistic and performative dimensions. What possible methodology could effectively cover that??
Furthermore, university research on education is very different to teacher research on education. Teachers often feel they have been given obtuse answers to problems that were not really problems in the first place. I work a lot with teachers in trying to frame an enquiry around a particular problem and am often amused at the difference between what those in the university call a ‘problem’ and what those in the classroom call a ‘problem.’
For example: ‘How do kinaesthetic learners navigate the gender divide within the iPad-centric classroom?’ is a redundant question with an even more redundant answer, and ultimately a waste of time for all concerned.
What gives me hope however, is that the divide between theory and practice is finally being traversed and practitioners and academics are engaging in ways not seen before in education research.
Your debate with Dylan Wiliam here at Wellington was to my mind, a singularly unique and important event in the public discourse between the academy and the classroom practitioner. Furthermore, Tom Bennett’s ResearchED, the work that Rob Coe and the EEF are doing with Alex Quigley and John Tomsett at Huntington, and the way in which David Weston and NTEN are facilitating teacher research are all very exciting (and robust) developments which represent very different and novel approaches to mobilising and applying effective education research.
Maybe we need to stop thinking in terms of seeking a finalised, singular ‘answer’ and instead think in terms of an ongoing enquiry that provides a series of answers that engender and facilitate what education research is really about: being an informed and reflective practitioner.
So, are you arguing that research is useful for the journey it takes us on as individuals? If so, I think I agree, but with one caveat: there has already been enough enquiry into ‘what works’ in classrooms to convince me that we don’t really need to know anything more about what works. maybe instead the research in which we ought to be engaged is critical analysis on what’s already out there?
Yes I totally agree. The starting point of an enquiry should be the wider evidence and a body of literature around that. A good example of which is the EEF toolkit.
I liken this process to the ‘standing on the shoulders of giants’ trope, as opposed to the image of two kids in a long overcoat and fedora trying to pass themselves off as an adult which characterises many school’s approach to evidence based education.
I’m heading into the weekend feeling battered about evidence in education.
Try a few recent posts from @Jack_Marwood http://icingonthecakeblog.weebly.com/
Also read this http://blogs.nature.com/news/2011/09/reliability_of_new_drug_target.html which suggests those figures about reproducibility are not just an education issue
Of course, Ofsted don’t ever seem to have been terribly engaged with research, preferring to trust their expert judgement but that’s a load of tosh too http://jtbeducation.wordpress.com/2014/06/29/whats-the-easiest-way-to-a-secondary-ofsted-outstanding/
Maybe it’s just late!
Being battered is a resoundingly negative stance – maybe all this critique should make us feel encouraged?
Thank you for such an important analysis of the situation. The major problem I see is that we never know exactly what technique is being adopted in the experiment. That’s why in Hattie’s list of top techniques, we are offered a rank of abstractions: feedback for example. ‘Feedback’ is a concept that encompasses very many different techniques. And so teachers trying to replicate the success of these high-scoring approaches very probably employ different techniques to those that had been tested. Or the same technique (chased purely by chance) but executed in quite a different manner.
Even if these techniques were described in what the researchers think is procedural precision, we’d still not be able to escape what Dylan Wiliam terms ‘lethal mutation’ of ideas. Not only do teachers unwittingly engage in chinese whispers (is this term still acceptable?) but the (mis)interpretation starts with reading of the experiment. Or with the author’s interpretation of the experiments. Just all too vague.
Thanks Oliver – I’ve critiqued Hattie’s advocacy of feedback here: https://www.learningspy.co.uk/featured/reducing-feedback-might-increase-learning/
Is mutation lethal? Certainly not if we look at it in Darwinian terms – it is the process by which natural selection occurs. Now I’m not such a fool as to suggest only the most ‘fit’ ideas will survive – there’s millennia of human history to prove any such assertion wrong, but maybe the melting pot of classrooms is no bad thing? The problem with any mutation of initially sound theory is that it then becomes a matter of compulsion with teachers expected to do silly things.
Well Dylan Wiliam seems to think such mutations end up with the essential components of a successful strategy being watered down. I concur with you that we need to assign worth to teachers’ adaptation of known techniques to their particular contexts. But, how then do teachers share their practice when it becomes so individualised, with only superficial features common to other similar techniques all living under the same banner, as in AfL?
Regarding my point about researchers not specifying exactly the procedural components of the teaching techniques being tested, do you know of any source that provides such information? And what do you make of the effect size ranking if none of the listed strategies can be specified sufficiently for teachers to know they are adopting the same ones as those tested?
Wiliam’s position vis-a-vis the bastardisation of AfL is reasonable – but then, I’d argue AfL is a fundamentally flawed concept anyway: https://www.learningspy.co.uk/myths/afl-might-wrong/
As to how teachers sharing individualised practice, I’m not sure I want them to. It may be interesting to see how far off piste they’ve travelled but I certainly don’t want anyone being told how to teach based on such peregrinations.
Finally – effect sizes seem more than a little problematic – here’s a critique: https://www.learningspy.co.uk/myths/things-know-effect-sizes/
Thanks David, two great links. I’m a little confused by your apparent view that ‘you don’t want teachers to share their individualised teaching practices’. And don’t know how that became teachers ‘being told how to teach’.
Is it because once a particular type of practice is valued, it becomes known as ‘best practice’ and what follows is an immediate notion that everyone else should adopt it? If so I understand this very real danger. But, nonetheless, don’t teachers want to learn from each others’ practice in a way that offers precise information of the techniques used?
What often happens is that Ms X use a new strategy with her class and it all goes swimmingly. SLT then say, we all need to do what Ms X does and it backfires like a wet fart. The problem with teachers learning from each other’s practice is that it’s very hard to isolate “precise information of the techniques used”.
Very interesting post. However from the beginning a positivist stance is adopted as what count as evidence in a too much method driven text. Educational research as any research has nothing to do with methods, but with constructing educational theories that (maybe) work; or just explain; or just add more knowledge to science education. So what you called evidence in a sense that seems to me theory free, I think evidence is bounded by theory as much as practice (educational practice) is bounded by believes.
Hi Paulo – it seems to me that maybe one of us has the wrong end of the stick – I was labouring under the belief that positivism is the belief that truth can only be derived from empirical evidence and that I was critiquing that proposition. How fascinating that my post can be read as a defence of positivism!
[…] read David Didau’s latest BLOG with interest, as I usually do. And as usual I agree with some of it but not all. I agree with the […]
Ioannidis published his famous paper in 2005 (http://buster.zibmt.uni-ulm.de/dpv/dateien/DPV-Wiss-False-Research-Findings.pdf) claiming that most medical research papers are flawed – particularly small studies and ‘hot’ topics where researchers are in haste to get to print. Rather like educational research – very few studies were ever retested.
He showed that of the 49 most highly cited medical papers, only 34 had been retested and of them 41 per cent had been convincingly shown to be wrong. And yet they were still being cited [from recent R4 broadcast – http://www.bbc.co.uk/programmes/b04f9r4k%5D
Similar studies have showed that research papers in economics, social sciences and other areas of science are similarly flawed.
So do we reject the scientific process for Medicine, Economics or Social Sciences?
With all it’s flaws the scientific process still leads to progress. And what are the alternatives?
So the question for educationalists must surely be – how do we subject education to the scrutiny of the scientific process so that we diminish the false positives?
Scientific evidence leads to what we could call progress in many areas but seemingly not in education.
There is no lack of evidence about what works in education and in=t makes not the least difference.
Maybe educationalists should acknowledge that until they agree on the purposes of education there’s little point conducting research which doesn’t concern itself with values?
I agree that, overall, the ‘purpose of education’ would be likely to cast a long shadow over the worth of any edu-research. Even in the physical sciences the conceptual theory within which the research has been conceived will significantly effect the design of the experiment. At least, in the physical sciences, when the findings don’t then conform to the theory alarm bells could go off (but often don’t). As you say, in education the conceptual context isn’t agreed, and interpretation is even more problematic. My experience in science is that the research undertaken by academics is seldom directly applicable to those of us working in the applied sector (much like teachers are)…..a significant amount of research needs first to be synthesised and distilled, then re-tested in a focussed way, before we can use it with confidence…..
[…] reached some tentative conclusions about evidence in education in my last post. One of the criticisms I keep coming up against is that my thinking is […]
[…] reached some tentative conclusions about evidence in education in my last post. One of the criticisms I keep coming up against is that my thinking is […]
Hmm! Interesting stuff David and very real to the conundrums we grapple with day-to-day. Reminds me of this about – ‘what happens if it’s not gold standard’ following on from last year’s ResearchED http://www.lkmco.org/article/so-whata%3F%3Fs-it-doing-research-ed-stuff-08092013
[…] Some tentative thoughts about evidence in education […]
[…] Some tentative thoughts about evidence in education […]
[…] read David Didau’s latest BLOG with interest, as I usually do. And as usual I agree with some of it but not all. I agree with the […]