“Optimism and stupidity are nearly synonymous.” Hyman G. Rickover — Speech to US Naval Post Graduate School, March 16, 1954
In this post I picked up on a rather odd comment made by Professor Hattie at a recent conference:
…tests don’t tell kids about how much they’ve learnt. Kids are very, very good at predicting how well they’ll do in a test.”
Are they? In my response I argued that he’s wrong:
Most students are novices – they don’t yet know much about the subject they’re studying. Not only do they not know much, they’re unlikely to know the value of what they do know or have much of an idea about the extent of their ignorance. As such they’re likely to suffer from the Dunning-Kruger effect and over-estimate the extent of their expertise. All of this creates a sense of familiarity with subject content which leads to the illusion of knowledge. The reason tests are so good at building students’ knowledge is because they revealing surprising information about what is actually known as opposed to what we think we know. Added to that, our ability to accurately self-report on anything is weak at best.
With thanks to George Lilley, a bit of investigation has revealed the potential source of Hattie’s mistake. One of the interventions rated most highly in Visible Learning is ‘self-reported grades’ with a whopping effect size of d=1.44.* According to Hattie’s calculations this would represent an incredible advance of over three years additional progress. If Hattie’s right it would criminally negligent not to harness such an unimaginable force. So, what is this voodoo?
It turns out that self-reported grades is… students predicting what grades they hope they are going to get. If you predict you’re going to get an A, then you will! It’s as simple – and as improbable – as that.
Hattie used these 5 meta-analyses used to get the average d = 1.44:
• Mabe & West (1982): Validity of self-evaluation of ability. (pdf)
• Fachikov & Boud (1989): Student Self-Assessment in Higher Education.
• Ross (1998): Self-assessment in second language testing. (pdf)
• Falchikov & Goldfinch (2000): Student Peer Assessment in Higher Education.
• Kuncel & Crede & Thomas (2005); The Validity of Self-Reported Grade Point Averages, Class Ranks, and Test Scores.
But, as Lilley points out here, two of the studies weren’t even attempted to measure the effect of self-reporting grades. Falchikov (2000) was studying the effects of peer-assessment whilst Kuncel (2005) was testing whether students were able to remember their test scores from the previous year. At least part of the effects cited by Hattie as evidence for self-reported grades is actually evidence of something entirely different.
The authors of several the studies themselves go to the trouble of warning against Hattie’s interpretation:
Since it is often difficult to get results transcripts of student previous GPA’s from High School or College, the aim of this study is to see whether self-reported grades can be used as a substitute. This obviously has time saving administration advantages. Kuncel et al (2o05) p.64
We conceive of the present study as an investigation of the validity of peer marking. Falchikov and Goldfinch (2000) p.288
The intent of this review is to develop general conclusions about the validity of self-evaluation of ability. Mabe and West (1982) p.281
Not only were the studies cited in Visible Learning not in fact measuring what Hattie claims they were, worse, Falchikov and Boud (1989) actually state that “the greater the effect size, the less the self-marker ratings resemble those of staff markers.” (p. 417) Or, in other words, high effect sizes are more likely to down to students’ inability to accurately predict their grades not that over prediction causes increased performance as Hattie appears to have concluded.
The hammer blow comes from Dr Kristen Dicerbo:
The studies that produced the 1.44 effect size did not study self-report grades as a teaching technique. They looked at the correlation of self-report to actual grades, often in the context of whether self-report could be substituted for other kinds of assessment. None of them studied the effect of changing those self-reports. As we all know, correlation does not imply causation. This research does not imply that self-expectations cause grades. [my emphasis]
All this strongly suggests not only that getting students to predict their grades is unlikely to have much of an effect on increasing said grades (Who knew!) but that Hattie is very likely fooling himself when he says students are “very, very good at predicting how well they’ll do in a test.”
*I’ve critiqued the idea of effect sizes here.
Thanks for this. It’s very timely given our teaching and learning group have just started investigating the top 6 aspects identified in the effects sizes in relation to our outcomes, approaches and performance in school. We did find the Self-Reported grades title to be slightly odd and assumed we were missing something. If your interpretation of Hattie’s definition is accurate then it’s very difficult to argue with your critique. Predicting your grades does not make you better.
Furthermore, given that boys perennially over-inflate their potential and performance and girls do the reverse, should boys not be achieving higher grades? I appreciate this is a rather crass question and we’re missing the nuance of this title that does not seem to be in Hattie’s print anywhere.
When looking at Hattie’s effect-sizes, the first two can be ignored as they claim to be correlations, not effects. The first claims that students are accurate at predicting their grades, The second that if, instead of testing your students on their subject knowledge, you test their thinking level using a Piagetian test, you get a good correlation.
The first has nothing practical to offer teachers as far as I can tell.
The second is more interesting. Anecdotally: I did this myself with the entire yr9 (300 students). Assessed them using a Piagetian reasoning task and compared this with the KS3 results they had just taken.
The correlation was so good I was left wondering why we got them to sit the SATs! The Piagetian task even revealed high ability students who were underachieving. We put some in higher sets for KS4… and it seemed to work.
There is a lot wrong with Hattie’s list, but, if we simply extract the classroom methods (from the list – ignore Hattie’s interpretations) and compare them with the lists from Marzano and the EEF we see that Hattie’s effect-sizes are inflated compared to EEF (probably because he included low quality research), but that the three lists are largely comparable. see http://www.ebtn.org.uk/evidence/combined-list
It does not seem reasonable [as some seem to do] to say “Hey, look at this error in Hattie’s work – that means its all rubbish.”
Hi Mike – I’m not saying Hattie’s work is all rubbish just that his interpretation is often eccentric.
As to test results correlating to Piagetian tests, they also correlate pretty well to CATs and other IQ tests. You *could* ask, why not just give kids an IQ test and don’t bother with GCSEs but that would miss an important point – that point is that we teach what’s assessed and KS3 assessments in particular are about holding schools accountable for curriculum delivery.
The clear argument in Hattie’s book is that the influences that have the highest effect sizes should not be ignored. “An achievement continuum has been developed, along which many effects can be located… The barometer of achievement can be used to assist in seeking the explanation of what leads to successful learning …” His constant mantra of “know thy impact” reinforces his rankings.
He also constantly calls into question teacher’s “professional” judgments: “Another reason for the lack of change is the over reliance on teacher judgments rather than evidence.” Definitely implying he has “the evidence” and it should not be ignored.
Many if not most of the effect sizes are based on correlation studies not true experiments; so that surely must call into question all his effect sizes and rankings?
If that doesn’t, then the US government funded study into, “effect size bench marks” by year level, using US National testing, must throw significant doubt into Hattie’s effect sizes and rankings.
See also my post from a while back https://academiccomputing.wordpress.com/2013/08/05/book-review-visible-learning/ which only mentions the self-reported grades briefly, but goes more deeply into all the other many statistical problems with Hattie’s Visible Learning book. It’s consistently the most-read post on my blog; shame it’s about the flaws in someone else’s work, not about my own work!
Thank you Neil
Thank you David. I have posted to our Google community for the Distance Learning PGCE with these comments (sorry I can’t find a way of highlighting but the comment ends with the exclamation mark):
I used this quotation in my thesis to critique evidence-based teaching.
“The tiny dead bodies of knowledge disinterred by systematic review hold little power to generate new understandings, and are more likely, I suggest, to incapacitate researchers than to contribute to research ‘capacity’” (MacClure 2003).
If you want a copy of the article just let me know.
I suppose that it is a lesson to us all – especially me, to question everything. We don’t all have the time to forensically examine the methodology in academic literature, but we do need to be cautious about anything that purports to ‘have the answer’ to complex educational issues. Hattie’s work is considered to be seminal, relied upon by most, and offered as a key text on our PGCE. I still think it is an important piece of research, but thanks to David Didau’s blog I am far less taken with the findings than previously. If you cite Hattie’s work in your assignment, a little ‘however’ and reference to the key arguments in his blog will go a long way to raise the criticality of your work.
I went on to raise the issue of citing academic sources in Master’s level work – it is considered bad form to cite blog posts unless there is a weight of reputation behind the blogger – I am happy for them to cite your blogs – so there’s a backhanded compliment.
The quotation can be found here: http://www.esri.mmu.ac.uk/respapers/papers-pdf/Paper-Clarity%20bordering%20on%20stupidity.pdf
I hope that you can find time to read Maggie’s article and her book Discourse in Educational and Social Research. It helped me to emulsify the snake oil.
sorry, I meant to say the G+ comment ends with the word ‘work’.
Much to question about the validity and reliability of the Hattie research. Much to question about anyone’s research and much to question about the relevance of some your own arguments. It comes down to considering everything and forming your own perceptions. And as Hattie himself said (or perhaps, just repeated ;)) – there’s no such thing as the immaculate perception! My perception – Hattie made was a masterclass of sense without any of the data, at the VLworld conference.
I question everything, especially myself. Plenty of post on here about how and why I’ve been wrong and changed my mind.
But I don’t doubt that those who paid to attend a VL conference were thoroughly convinced by Hattie 🙂
This piece seems to be based on a complete misconception. Nowhere does Hattie advise that self reported grades can be operationalised to improve student achievement in any way, let alone in the unlikely way assumed by you, and by Dr Kristen Dicerbo.
Even a casual reading of any of Hattie’s Visible Learning books shows that instead he advocates teachers assessing their impact and improving from there, not ticking off individual effect sizes – he specifically warns against this.
If you read pages 43 to 44 in his first Visible Learning book on Self-Reported Grades Hattie uses the high effect to warn teachers against the problem of weak students setting themselves low expectations, which you and I would both agree with I’m sure. He does not suggest the odd strategy you accuse him of.
Mike Bell above is right that self reported grades is not a teaching technique, it is research looking at the correlation between what students actually achieve, and how well they think they will achieve. It is of interest, but it is not a strategy.
Many effects on Hattie’s table cannot be operationalised for example ‘Family Structure’ is not easy for a teacher to affect. I trust you or Dr Dicerbo are not preparing an article accusing Hattie of suggesting that teachers meddle with the family structures of their students, but you never know, odder things appear on the web.
Given your great influence, twitter following etc, it would be helpful if your piece could be updated, I know you generously ask for constructive feedback, I hope you will accept this as such.
Hi Geoff – the reason for this article is the fact that Hattie is going round saying “Kids are very, very good at predicting how well they’ll do in a test.” They are patently not. I may of course be wrong about this but I assumed that Hattie’s misconception may be connected to the stuff on self-reported grades.
I’m guessing the comment “Nowhere does Hattie advise that self reported grades can be operationalised to improve student achievement in any way, let alone in the unlikely way assumed by you, and by Dr Kristen Dicerbo” is in relation to “It turns out that self-reported grades is… students predicting what grades they hope they are going to get. If you predict you’re going to get an A, then you will! It’s as simple – and as improbable – as that.” Yes, I was being mischievous, but thankfully I know of no school which tries to operationalise this because it’s clearly nonsense. I can’t speak for Dicerbo but he doesn’t seem to making any such claim either.
The point is that Hattie’s reported correlation is spurious. You seem to being trying to excuse this by claiming I’ve misunderstood Hattie’s work. I really haven’t and as such, I can’t see any reason for updating the article.
The book is absolutely about which influences have the most impact on achievement. Why does Hattie do a ranking and rank “self report” as No.1? What else can that mean?
Also, you say nothing about Hattie’s misrepresenting 2 of the 5 studies. Does that not concern you?
Well I took “All this strongly suggests not only that getting students to predict their grades is unlikely to have much of an effect on increasing said grades (Who knew!)” …..to mean you thought that was what Hattie was advocating.
Most students, especially older able ones, are good at predicting their grades (summative) Hence the high effect size for self reported grades. What they don’t do so well as you rightly point out is to understand what mistakes they made (formative). The self reporting grades research is only on predicting summative outcomes. Your critique seems based on the assumption its about formative assessment as when you write “The reason tests are so good at building students’ knowledge is because they revealing surprising information about what is actually known as opposed to what we think we know”
However my worry is more general than this. There are a lot of problems with effect sizes and meta studies but they are the only remotely objective way we have to compare influences, and so prioritise how best to improve achievement. If there is only one horse pulling your carriage, its not a good idea to shoot it.
Ah, OK so you’re argument is along the lines of democracy is the best worst form of government? I can accept that, but to then say, don’t critique when people do it badly seems absurd. Surely it’s more sensible to accept that improvement is iterative and by acknowledging our mistakes and failures we can refine and improve the process of measurement and comparison?
No criticism is vital I agree, and I take the point that sometimes studies are lumped together in a rather carefree way. This needs to be looked at. But some people will read criticism of Hattie and then ignore him AND all the excellent studies he cites, and then, fatally, assume their own professional judgement is better. There are more than 300,000 RCT-like studies and in nearly all of them the professional judgment used by the teacher in the control group, is beaten by a strategy suggested by the researcher. Pause for thought for us teachers!
Regular readers of my blog know I am no fan of teachers’ judgement: https://www.learningspy.co.uk/leadership/it-works-for-me-the-problem-with-teachers-judgement/
An additional perspective on this issue comes from the famous “Dr. Fox” experiments of the 1970s. In brief, students who were given upbeat, engaging presentations thought they learned more than students who had been given high-content lectures presented in a boring monotone. But in fact, when they were tested, they had not. Students do not always know when they are learning and when they are not.
References
Abrami, P. C., Lenventhal, L., & Perry, R. P. (1982). Educational seduction. Review of Educational Research, 52(3), 446-464.
Naftulin, D. H., Ware, J. E., & Donnelly, F. A. (1973). The Doctor Fox lecture: A paradigm of educational seduction. Journal of Medical Education, 48(7), 630-635.
Ware, J. E., & Williams, R. G. (1975). The Dr. Fox effect: A study of lecturer effectiveness and ratings of instruction. Journal of Medical Education, 50(2), 149-156.
[…] 28th January – John Hattie and the magical power of prediction […]
Thanks David for your timely article on Hattie. We were just presented with the top 10 learning effects by our principal, the implication is this is what all the evidence proves. None of us have time to read the book let alone the background studies. So summaries like this are helpful.
Looks like Hattie’s work is not as certain as they make out. Keep up the great work.
[…] https://www.learningspy.co.uk/research/john-hattie-and-the-magical-power-of-prediction/ […]
Thanks David for sorting out this topic in Hatties work. It so happens, that I helped a teacher conduct a small research project in his own classes on the effect of students predicting their own grades. Hatties awesome 1,44 effectsize seemed very attractive to us. This teacher actually did have students predict what they would score on their next test. He also tried to let them agree to a higher grade than they initially predicted. We found an effectsize of 0,74. I must admit it’s the highest effectsize I have encountered in recent years doing small scale action research with teachers. I hope to conduct another experiment next year to see if the effect will repeat itself.
Can you give more details on your experiment Jan? How did you organise the control and experimental groups? Were the kids divided into each group randomly? What was the subject and what were you testing? the age of the kids? The time over which the experiment ran?