Since my last foray into the world of intelligence testing, I’ve done a lot of reading about the idea that a) IQ tests are culturally biased and b) that the entire concept of intelligence is culturally biased. I want to preface my conclusions by reiterating the following points:

  1. I do not believe we should ever use IQ tests in schools to classify students, or to predict their academic acheivement.
  2. I do not believe that any group of people is in any way superior to any other group. The fact that various studies show differences in the IQ scores of men and women, different ethic groups and people of different socio-economic status has no bearing whatsoever on whether we should pursue a progressive political agenda.
  3. I am convinced that all differences in the average IQ scores of different groups are caused by unfair distribution of wealth, access to education and systematic discrimination. These are things that social policy should seek to address.
  4. IQ scores, whatever their validity, can never predict the worth of an individual. Everyone deserves to be treated fairly and with respect. The idea that we can or should select for some people’s idea of desirable traits, or engage in any other form of social engineering is reprehensible.

So, with those disclaimers made, let’s examine the argument that IQ tests are culturally biased. This argument usually rests on the finding that the average of IQ scores of different identified groups are not same. If we find that the average score of group x is higher than group y we might be tempted to say that those in group x are naturally more intelligent than those in group y, or we might conclude that the life experiences of those in group x are substantially different to those in group y ensuring an unfair advantage in the test. So, while some differences between groups may have something to do with nature, many will be due to differences in the environment. If we accept that environmental differences cause the difference in the performance of different groups, we then have two choices. We can either try to identify what these environmental factors might be – poverty, poor diet, lack of access to education etc. – and try to rectify these unfair disadvantages, or we can reject the validity of the mechanism from which we obtained the information.

The first option is hard, the second is relatively easy. In 1979, Robert Serpell found that the media in which a task is presented makes a great deal of difference to how children from different cultural backgrounds perform. His study compared the performance of British and Zambian children and found that when they were asked to reproduce a figure using wire, the Zambian children out-performed their British counterparts, but when the children were given pen and paper, the British children did best. Serpell concluded that we develop “highly specific perceptual skills” depending on our experiences and what we have practiced. Almost all British children have extensive experience of using pen and paper to make drawings whereas Zambian children have less access to expensive pens and paper, but much more practice at manipulating found objects, such as wire, to represent the shapes of animals and people.

Serpell recommended that tests should be designed to reflect the contexts that subjects are most familiar with. He suggests that Western-educated subjects “have acquired the skills (such as drawing, interpreting pictures, assembling jig-saws and building patterns with blocks) which form an infrastructure on which the test performer must draw” and that “To measure with non-verbal pictorial tests the abilities of children whose cultural experiences does little to impart pictorial skills is just as hazardous an enterprise as testing children in a second or non-dominant language.”

Fair points. In his masterwork, Guns, Germs and Steel, the geographer and biologist Jarred Diamond goes further. For many years Diamond lived and worked in Papua New Guinea and the book is an attempt to explain why people of Eurasian origin, and not the natives New Guinea, were the first to export the building blocks of colonialism. The question he sets out to answer is: “Why did wealth and power become distributed as they now are rather than in some other way?”

Diamond presents a typically racist explanation:

White immigrants to Australia built a literate, industrialized, politically centralized, democratic state based on metal tools and on food production, all within a century of colonizing a continent where the Aborigines had been living as tribal hunter-gatherers without metal for at least 40,000 years. Here were two successive experiments in human development, in which the environment was identical and the sole variable was the people occupying that environment. What further proof could be wanted to establish that the differences between Aboriginal Australian and European societies arose from differences between the peoples themselves?

Diamond goes on to suggest that there are two problems with the data showing differences in IQ between peoples of different geographic origins now living in the same country:

First, even our cognitive abilities as adults are heavily influenced by the social environment that we experienced during childhood, making it hard to discern any influence of preexisting genetic differences. Second, tests of cognitive ability (like IQ tests) tend to measure cultural learning and not pure innate intelligence, whatever that is.

This, as we shall go on to explore, is an empirical statement that bears some investigation.

Diamond then makes a fascinating, if anecdotal observation that New Guineans are, “on the average more intelligent, more alert, more expressive, and more interested in things and people around them than the average European or American is.” He argues that living in a ‘civilised’, urbanised environment makes it relatively easy for anyone to pass on their genetic material, as opposed to a more ‘primitive’ hunter-gather society. He argues that “…natural selection promoting genes for intelligence has probably been far more ruthless in New Guinea than in more densely populated, politically complex societies”.

Could it be that people from non-Western societies are actually cleverer and that all IQ tests are doing is providing flimsy evidence that those people with access to Western education are better at those things most valued by the Western educated elite? This is certainly a point of view with much popular appeal, but how does it stack up against the data?

For IQ tests to be unfair, getting the right answer to a question would have to depend on factors other than intelligence, such as education, social class etc. So, is this the case? In his new book The Neuroscience of Intelligence, professor of psychology, Richard J. Haier argues no, this is not the case. Part of the problem is distinguishing between fair, valid and biased. Now of course, we mustn’t get mixed up between the properties of tests and those of our inferences; a test is just a tool and it is our interpretation that gives it meaning or validity. That said, most people would be happy to say that a question is fair if the get the right answer, but is a question biased against you if you can’t answer it?

Does getting a low score on an IQ test mean you’re not intelligent? Probably. There are several possible reasons why you might not know the answer to a question. It might be you were never taught the answer, you never learned it on your own, you might have forgotten it, you might not know how to reason it out or, knowing how, you’re not able to reason it out. Haier argues that most of these reasons relate, in some way, to general intelligence.

Getting a high score, on the other hand, means you know the answers. But, does it matter how you know the answers? Have you had the advantage of a good education? Have you got a better than average memory? Are you perhaps just one of those people who are good at taking tests? Haier suggests once again that general intelligence covers all these things.

Test bias is different. For a test to be biased against a particular group, scores would have to consistently under or over predict performance of that group in the real world. By predict, obviously I’m talking about certainties; no test is ever 100% accurate in its predictions. Instead we need to talk about correlations and probability.

Haier gives the example of the SAT:

…if people in a particular group with high SAT scores consistently fail college courses, the test is overpredicting success and it is a biased test. Similarly, if people with low SAT scores consistently excel in college, the test is underpredicting success and it is biased. A test is not inherently biased just because it may show an average difference between two groups. (p.18)

But, let’s say you come across a test item you feel may be unfairly culturally biased. Thankfully, this is something IQ test designers take seriously and there’s a mechanism for raising your concerns. The mechanism is Differential Item Functioning. Basically, where the scores of people from different subgroups who got the same overall score on a test are compared to see if the suspect question is measuring in differently for different subgroups. By examining how different groups score on a question, test designers can established whether it is biased against minority groups. In this way, IQ tests strive to systematically eliminate any questions that suffer from obvious cultural bias.

Consistency is key here. If predictions are inaccurate in a few cases, that’s not bias, that’s noise. Bias shows up where predictions consistently fail to point in the right direction. Predictions not coming true aren’t evidence of bias either: if a test predicts nothing, that means it’s invalid. (Or at least, that it’s not possible to draw a valid inference from it.)

“Intelligence, whatever that is”

As I discussed here, there’s a considerable body of evidence that IQ scores have quite impressive predictive power. IQ is particularly good (although not perfect) at predicting academic success, even when we control for SES, age, sex, ethnic origins and other variables. Haier also points out that  IQ also predicts various brain characteristics such as cortical thickness, or cerebral glucose metabolic rate. If intelligence tests were meaningless, they wouldn’t be able to predict anything. The fact that they can and do predict a wide range of things – including quantifiable brain characteristics – proves they have meaning.

What IQ tests don’t do, is tell us what scores mean. That is a matter of interpretation. Something real and meaningful is being measured which we have decided to call intelligence. We know IQ has implications for areas as diverse as functional literacy, job performance and being involved in road traffic accidents, but it doesn’t tell us whether these things are good or bad.

For instance, the US military won’t generally take recruits with scores below 90 because, on average, such people tend to find it harder to learn what’s required and run a greater risk of being killed in training. Is this fair? At the level of an individual, probably not. But at the level of thousands of potential recruits, that’s the decision the military has decided is in everyone’s best interest.

This is a numbers game, and while I can see it’s attractions, I’m utterly convinced that making such decisions in an education context would be entirely wrong. These are ethical considerations and, as I’ve suggested before, science makes a poor guide in determining what is right. What it does do, is give us a better picture of reality as it actually is, not as we would like it to be. This is, despite their limitations and imperfections, the value of IQ tests.

These then are my conclusions:

  1. Comparing people from widely varying cultures using an IQ test is probably pointless and unfair.
  2. IQ tests do include cultural information (general knowledge, vocabulary etc.) but great care is taken to avid this being biased against different groups.
  3. I haven’t been able to find any evidence that the predictions made by IQ scores are consistently wrong. If you know of any, I be grateful if you could pass the information on below.
  4. IQ tests clearly measure something real because they make clear predictions about performance on other metrics. It makes sense to call this thing ‘intelligence’.
  5. None of this tell us anything about what should be. For my money, what I believe ought to be the case is that everyone, no matter their background ought to be treated fairly. As I’ve argued here, fairness is different to equality.
  6. Selection by IQ is both abhorrent and unnecessary. The most useful and fair form of testing in schools is to test children on what they have been taught: what do they know and what can they do.