Intelligence DNA? An evolutionary biology public service announcement
0.891 (prologue and part one)11 Apr 2022In memoriamStephen Jay Gould (1941-2002) and Richard Lewontin (1929-2021),who got it right where it counted.Prologue
The concept of progress acts as a protective mechanism to shield us from the terrors of the future. 1
Astrological genetics2 is again on the rise, and seems to be recapping its best hits from the seventies and nineties, like "intelligence" and "gay genes", as well as some real oldies, such as dysgenics. As population geneticist Marcus Feldman, a long-time critic of simplistic hereditarianism, put it recently: "To those who were around when Jensen’s monograph appeared in 1969, it must seem like déjà vu all over again."3. Feldman refers to Arthur Jensen's infamous paper about racial differences in intelligence4. At the outset, I emphasize that this is not an essay on (alleged) racial differences. But the publication of that article was a watershed event in the history of behavior genetics, whose methods and findings I am going to discuss in detail. So a brief historical account of the field may help to set the background.
Behavior genetics (also called behavioral genetics) is, very broadly, an academic field dedicated to the study of inheritance and genetic aspects of human and animal behavior. Behavior genetics started out in the 1950s as a new transdisciplinary field, following the darker days of eugenics-mongering hereditarian psychometricians, during the first half of the twentieth century. It was to be a combined effort of animal geneticists, human geneticists, psychologists, and other researchers from across the social, biological and medical sciences. Eminent geneticists, like Theodosius Dobzhansky, were associated with behavior genetics back then. An implicit agreement was put in place, to concentrate on "pure science", and avoid controversial issues, like eugenics and race. Aaron Panofsky, on whose book5 I am basing this brief historical account, called this period the 'golden era' of behavior genetics.
All of that fell apart in 1969, when Jensen published his monograph on racial differences in intelligence, and during the "race-IQ controversy" that followed in the early 1970s (to which Feldman refers as "déjà vu"). The field has changed significantly since then. The dominant research program has become, practically, a version of human quantitative genetics, as practiced by psychologists with psychometrician and hereditarian leanings. Most importantly, behavior genetics has developed an internal culture of frowning upon criticizing other "club members". The field has become internally fragmented with little interaction, for example, between animal and human research. This internal culture and fragmentation has allowed all kind of dubious research (such as on race, eugenics, "gay genes", and so on) to take refuge within behavior genetics, under a uniquely loose interpretation of "academic freedom". Provocation and perpetual controversy generation became major tools for raising the field's profile. Which brings me back to the recent resurgence of "astrological genetics"6 – a collection of popular beliefs and crude forms of scholarship, advocating that the key to understanding (and also possibly treating) human behavioral traits (in particular, "politically sensitive" traits, such as intelligence, criminality, addiction, etc.) is a person's genome.
In this essay I write about the alleged "genetic basis" of differences in intelligence and education attainment (and to a lesser extent, other measures of cognitive abilities) among individuals. I am not a psychologist or a practitioner of "social science". My background is mathematical modeling, evolutionary biology and ecology. To borrow from Dobzhansky, when he himself wrote in 1973 on the race-IQ controversy7, "my excuse for intruding on this domain" is that the backbone of evolutionary biology since the 1940s has been population genetics. And it is population genetic models that the psychologists, who call themselves behavior geneticists, use (and arguably abuse).
My focus is individual differences. So I am not going to discuss differences between races, social classes, women and men, and so forth. These topics have been covered elsewhere8. Similarly, I am not going to tackle eugenics or scientific racism, to which "intelligence research" and behavior genetics are, at least historically, connected. I will also not discuss the validity of IQ or g (so-called general cognitive ability) as measures of intelligence (again, elsewhere9).
My purpose is to examine the empirical evidence, as reported in the scientific literature, the application of population genetics theory, and the rhetoric that lead to the claims of behavior geneticists. So I discuss results of genomic studies, and I discuss studies of resemblances among relatives (identical twins, fraternal twins, siblings, parent-offspring, etc.). But I do not address potential problems and biases in study design and data collection, or the validity of the twin method, in general10. Nor will I discuss the usefulness of heritability, or whether it is nonsensical11, a "bad statistic", or a "lousy scientific concept"12. This topic, in particular, has been discussed ad nauseam13, to little avail.
Part One: Pies and Pandemias
Ever sift sand through a screen? ... We Bene Gesserit sift people to find the humans. 14
I write this paragraph as the first wave of the COVID-19 pandemic is waning, and governments are starting to lift restrictions on movement and travel. Like viruses, the spread of human genes has also been controlled by historical restrictions on movement (and their subsequent easement), albeit at much slower pace and on the much grander time-scale of human history and prehistory.
Current geographic differences in DNA among humans partly reflect such historical restrictions – from mountains separating Italian villages15, or rivers separating counties and regions in England, France and the Netherlands16; to borders of medieval kingdoms in Finland and France17, or different linguistic regions in South Africa 18; to the Mediterranean sea separating Sardinia from the rest of Italy19; to differences across entire continents and across the entire globe, due to sheer distance20. But DNA geography also reflects overcoming of such restrictions – for example, historical migrations21 and trading routes22. Nowadays, you can swab the inside of your cheek, and get your DNA pinpointed (with percentages) on a world map. The meaning of this map in terms of deep ancestry is questionable 23. But nonetheless, such a map is a reflection of the scientific fact that different flavors of DNA (called alleles or polymorphisms in biology-speak) are either more common or more rare in different parts of the world. That is because, as humans have moved around the globe, each has carried along a unique set of DNA markers.
DNA can be represented as long sequences of four basic letters A, T, G, and C24. Each person's genome contains over three billion of these letters organized in long sequences, such that each position along the genome contains one such letter, either A, T, G or C. So for example, looking at five different positions in the genome, I might have a sequence A-G-T-C-C, while you might have G-G-T-T-C. So our DNA sequences differ at the first and fourth positions. I have A and C, while you have G and T. A hypothetical researcher may compare thousands of people, and may find out that in the first position, 90% of people in Helsinki have A, and only 10% have G; while in Turku it is about 50-50. Similarly, in the fourth position we may find that 60% of Helsinki residents have C, while in Turku it is only 20%. By keeping count of such differences (in allele frequencies) for hundreds of thousands of different positions in the genome, our hypothetical researcher can then say whether a person is more likely to be from Helsinki or from Turku (see caption of Figure 1). These differences are, in part, a manifestation of the historical pattern that people from Helsinki used to stay in or around Helsinki and marry Helsinki people. Same for Turku.
But isolation is not all-or-nothing, but rather a continuously varying degree. And so, consistent differences in allele frequencies exist on all levels of geographic separation – from differences among continents, to differences among countries, to differences among neighboring regions, cities and villages. Add to that the tendency of humans, even in recent times, to pair based on geographical proximity25, language26, cultural and national identity, skin color, socio-economical status, and so on, and you can see how such differences in allele frequencies will tend to be preserved even in the modern 'global village'. Geneticists call that population structure27.
This global population structure is a reflection of human history. It is an outcome of random processes operating over thousands of years, resulting in the slow but steady accumulation of such geographic differences. The differences occur because those processes have operated (sometimes more, sometimes less) independently in different parts of the world, and in different regions within continents and countries28. Biologists have many names for such random processes – mutation, genetic drift29, founder effect30, population bottleneck, isolation by distance31, and other similar concepts from the scientific field of population genetics. But the main point is that geographic separation translates to systematic differences in DNA, simply because of random chance operating more-or-less independently in different places and over long periods of time. Human DNA is an incomplete and jumbled record of the many fortunes and misfortunes of human history, including pandemics32.
But there is another aspect of DNA. Parts of our genome are crucial for building proteins, the molecular building blocks of our cells and body. Some alleles manifest themselves as genetic diseases and disorders. So obviously DNA has also an important functional significance. That is the fundamental duality at the core of all biology. All organic structures, from DNA, to birds' beaks, to human brains, have both functional aspects and (evolutionary) historical ones.
DNA, however, is special, at least in one respect. As the hereditary material that is passed form parents to children, DNA has risen to prominence in the public eye as the potential source of all that makes us human – from diseases and disorders, to personality traits and intelligence. This has been termed the DNA mystique33. For example, assume that a researcher would seek to establish the 'genetic basis' of eating karelian pies (a type of Finnish pastry). This researcher finds that 80% of Helsinki residents eat at least one karelian pie a week, while in Turku it is only 40%. Coupled with the differences in allele frequencies (Figure 1), our hypothetical researcher concludes that having A in the first genome position or C in the fourth is associated with increased "risk of consuming Karelian pies".
Similarly, because of global population structure, some alleles will be more common in Finland than in other countries. Given that Karelian pies are mostly consumed in Finland, our hypothetical researcher would again conclude that their consumption has a "genetic basis". Obviously, this association is completely spurious. The differences in allele frequencies and the differences in consumption habits are correlated through their dependence on a third common factor, geography in this case. This is of course a highly contrived example. Correlation does not entail causation, and real-world studies are not that sloppy (though some approach it). But it serves to demonstrate the fundamental problem with such research – because DNA has such a significant historical aspect, as manifested by population structure, it is not straightforward to disentangle any potential functional aspect.
This is especially difficult in humans, because we possess a parallel and much more efficient mechanism of inheritance – cultural transmission. Humans carry their habits, customs, traditions, beliefs, prejudices, sentiments, skills and slang with them, not only their DNA. So any correlation between behavioral traits and DNA markers may simply be a passive consequence of the fact that humans come in lumps – discrete packages of both DNA and culture. Obviously, local or national food recipes have nothing to do with DNA. They are traditions transmitted culturally from one generation to the next. But what about other kinds of individual differences, such as professions, personality, behavioral patterns and tendencies, socio-economical status, educational attainment, or intelligence?
So in order to study functional genetic influences on such complex behavioral traits, it is essential to first remove the confounding historical influences. Several techniques have been invented to estimate and remove effects of population structure. Despite the tall tales and exaggerations of some researchers and in the general media about the importance of DNA in determining human behavior and intelligence, there are only two consistent and robust patterns in human genetic variation, and both are historical. First, within any (large enough) population, most of the global human genetic variation can be found. Second, consistent systematic differences among (sub)populations is of the historical-genealogical type, simply confirming our common descent from Africa, as well as the fact that each human, both past and present, inherited DNA from two other persons. While of anecdotal interest, in terms of seeing the footprints of historical events (ancient migrations, wars, colonizations, trade routes, pandemics, urbanization etc.) in the geographic patterns of DNA variation, such geographic patterns do not offer any insight into functional aspects of DNA.
In other words, it is the historical-genealogical patterns that are consistently observed in human DNA variation. But once you remove them, is the remaining variation truly functional? There are several reasons to doubt that. First, despite decades of attempts, no major-effect genes on behavioral and psychological traits have been found. In particular, no major-effect genes of intelligence-related traits, such as IQ and educational attainment, have been discovered. (Disregarding here major genetic disorder syndromes that impair cognitive abilities among many other effects.)
Secondly, and consequently, behavior geneticists have adopted a different approach in recent years – polygenic scores. Returning to our hypothetical researcher, rather than studying the "genetic basis" of "Helsinkiness" or "Turkuness", or of "risk of living in Turku" or "risk of consuming karelian pies", he now switched to education attainment. He finds that average length of formal education among Helsinki people is 13 years. If the national average is 12, he concludes that having A in first genome position or C in fourth genome position (Figure 1) has the average effect of improving one's education attainment. By further analysis, he invents a formula to "predict" education attainment from DNA sequence: polygenic score of education attainment = 12 years + (8 months, if A in first genome position) + (7 months, if C in fourth genome position) + ...This formula can be extended (hence the "+ ..."), by studying a wider range of populations and subpopulations, and by documenting variation in many other genome positions.
For example, Turku people that move to Helsinki are generally better educated - moving to study in university, or for jobs in the tech sector. As a consequence, within Helsinki, "Turku genomes" (G in first position, T in fourth position; Figure 1) will be associated with higher education attainment (compared to the average Helsinki resident). And so, our hypothetical researcher may, in this case, conclude thatpolygenic score of education attainment = 12 years + (8 months, if A in first genome position) + (7 months, if C in fourth genome position) + (1.5 years, if G in first genome position) + (2 years, if T in fourth genome position) + ...
Overall, when calculating polygenic scores, researchers combine a large number (tens of thousands) of alleles that each individually makes a very tiny contribution. But when all those alleles are combined together (with the proper weighing of contributions from each), one may achieve some statistical significance and can show some correlation between the observed trait (for example, education attainment) and the calculated polygenic score. Therefore, in theory at least, using such formulas, researchers could calculate a person's tendency to be highly educated, or risk of developing a certain disease or mental disorder, solely based on that person's genome. It is important to emphasize that there is no mechanistic understanding on how such alleles may affect traits like educational attainment or risk of schizophrenia. This method can only uncover some statistical association with such traits. The mechanistic causes of the association are left unknown.
Moreover, polygenic scores typically explain only tiny fractions of the variance in human traits – no more than few percents. Now, that is perfectly fine, if the purpose is just to identify candidate genes for disorders, or make initial exploratory first steps, before embarking on more detailed research. But it is absolutely reckless to claim to be able to "predict" intelligence or educational attainment, based on a person's genome, when over 90% of the variance is left unexplained.
Thirdly, it is increasingly realized that such polygenic scores may actually represent residual population structure that has not been correctly removed 34. In other words, a consequence of faulty methodology, rather than actual functional influences. My schematic example, above, clearly demonstrates that. Helsinki-Turku differences in DNA are passive consequences of history and genealogy. Helsinki-Turku differences in educational attainment are consequences of, for example, centers of higher education and tech industry, being concentrated in the capital, Helsinki. The association between DNA and education attainment is completely non-causal, resulting from patterns of migration and uneven distribution of opportunities for higher education and employment for the highly educated. While this schematic example may be highly contrived, passive consequences of migration seem also to be behind real-world polygenic scores for education attainment35. In other words, rather than revealing a true functional role of DNA, these polygenic scores are, again, simply a statistical artefact of population structure. Polygenic scores for education attainment, risk of schizophrenia 36, and other behavioral and psychological traits, are not any different than my contrived "genetic basis" of "risk of living in Turku" or of "risk of eating karelian pies".
Some studies acknowledge these fundamental methodological problems – for example, admitting that actual causes for the associations and correlations cannot be discerned37. But whenever they can get away with it, many researchers do push for strong38 functional interpretations of their results, paying little attention to the complicating effects of of population structure and historical aspects of DNA variation. For example, a study from 2017 39 found that polygenic scores for education attainment decrease over the course of the twentieth century. They even present their "findings" in the context of the infamous eugenic fairytale of decaying intelligence within modern populations 40. But, in fact, these researchers did not really demonstrate a trend in time. They analyzed data from different age groups, and only showed that average polygenic scores decrease as age gets younger. They concocted an explanation based on natural selection against education attainment and intelligence 41.
But a far simpler explanation is through population structure and improvements in availability of public education during the 20th century. In fact, the same study shows that actual education attainment (rather than "polygenic scores") has improved greatly over the course of the twentieth century. When education becomes more widely available, and less concentrated in particular geographic locations and in more-or-less hereditary social classes, the statistical association between particular DNA sequences and high education attainment becomes less and less pronounced. Population structure itself has also changed dramatically through the twentieth century (for example, due to urbanization42, or cataclysms, such as the Second World War43). Large urban centers, in particular, have become increasingly heterogeneous, as people became more mobile44. So it is in fact the statistical association itself between genetic markers and education attainment that gradually becomes weaker (again, my karelian pies example). That 2017 study simply recorded a statistical artefact of its own faulty methodology45.
You may rightfully laugh at my hypothetical researcher aiming to prove the "genetic basis" of consuming karelian pies, or of living in Helsinki, instead of in Turku. These examples are deservedly laughable, because they seek causality where there is none. A post-hoc after-the-fact reasoning that can be invoked to "prove" practically anything. But then, twin researcher Robert Plomin, in his "Blueprint" manifest46, would similarly have us believe that where we live is determined by our DNA47. In Plomin's world, human populations are the clean theoretical constructs of genetics textbooks – randomly mating, completely mixed, have no structure or geographic variation, and unchanging from one generation to next. If you believe that, then any correlation between DNA markers and complex human traits can only be interpreted as unidirectional causality from DNA to trait48. But real-world human populations are anything but that.
Clearly, the genetic composition of a nation of immigrants, like America, will be very much different than that of Italian mountain villages, where people have paired almost exclusively among themselves for many generations. Similarly, the genetic composition of a large city, to which people immigrate for education and work, is going to be different than that of rural areas. To try to explain the causes of individual differences in social status, income, intelligence and education through differences in DNA that are passive consequences of immigration, assortative mating and other independent population processes, is to completely reverse the arrow of causality. Stephen Jay Gould has called that "confusing bookkeeping with causality" 49. But behavioral genetics has always played on that ignorance of causes that comes from just recording correlations 50.
In conclusion, the validity and usefulness of polygenic scores for human behavioral and cognitive traits is still very much in doubt. Any association between such polygenic scores and differences in "intelligence" or education attainment can be better explained by geography and history, rather than actual functional genetic influences. In addition, polygenic scores explain very little of the variation in educational attainment. For now, most so-called evidence for the "genetic basis" of intelligence differences comes from twin studies and estimates of heritability – the focus of Part Two.