Race is not just a social construct

I have frequently heard people insist that “race is just a social construct”, that there is no genetic basis to it, that it has no statistical relevance, and so on and so forth.  This is clearly wrong, as others have pointed it out repeatedly, but people keep on repeating it for some reason.   To save myself and others time next time around, here is a compilation of the facts, evidence, expert opinion, and more that ought to settle the issue for most fair minded people that are not overly ideologically blinkered.

In no particular order….

Expert opinion

Jerry Coyne on race: 

I do think that human races exist in the sense that biologists apply the term to animals, though I don’t think the genetic differences between those races are profound, nor do I think there is a finite and easily delimitable number of human races.  Let me give my view as responses to a series of questions.  I discuss much of this in chapter 8 of WEIT.

What are races?

In my own field of evolutionary biology, races of animals (also called “subspecies” or “ecotypes”) are morphologically distinguishable populations that live in allopatry (i.e. are geographically separated).  There is no firm criterion on how much morphological difference it takes to delimit a race.  Races of mice, for example, are described solely on the basis of difference in coat color, which could involve only one or two genes.

Under that criterion, are there human races?

Yes.  As we all know, there are morphologically different groups of people who live in different areas, though those differences are blurring due to recent innovations in transportation that have led to more admixture between human groups.

How many human races are there?

That’s pretty much unanswerable, because human variation is nested in groups, for their ancestry, which is based on evolutionary differences, is nested in groups.  So, for example, one could delimit “Caucasians” as a race, but within that group there are genetically different and morphologically different subgroups, including Finns, southern Europeans, Bedouins, and the like.  The number of human races delimited by biologists has ranged from three to over thirty.

How different are the races genetically?

Not very different.  As has been known for a while, DNA and other genetic analyses have shown that most of the variation in the human species occurs within a given human ethnic group, and only a small fraction between different races. That means that on average, there is more genetic difference between individuals within a race than there is between races themselves. Nevertheless, there are some genes (including the genes for morphological differences such as body shape, facial features, skin pigmentation, hair texture, and the like) that have not yet been subject to DNA sequencing, and if one looked only at those genes, one would obviously find more genetic differences. But since the delimitation of races has historically depended not on the  degree of underlying genetic differences but only on the existence of some genetic difference that causes morphological difference, the genetic similarity of races does not mean that they don’t exist.


Why do these differences exist?

The short answer is, of course, evolution.  The groups exist because human populations have an evolutionary history, and, like different species themselves, that ancestry leads to clustering and branching, though humans have a lot of genetic interchange between the branches!

But what evolutionary forces caused the differentiation?  It’s undoubtedly a combination of natural selection (especially for the morphological traits) and genetic drift, which will both lead to the accumulation of genetic differences between isolated populations.  What I want to emphasize is that even for the morphological differences between human “races,” we have virtually no understanding of how evolution produced them.  It’s pretty likely that skin pigmentation resulted from natural selection operating differently in different places, but even there we’re not sure why (the classic story involved selection for protection against melanoma-inducing sunlight in lower latitudes, and selection for lighter pigmentation at higher latitudes to allow production of vitamin D in the skin; but this has been called into question by some workers).

As for things like differences in hair texture, eye shape, and nose shape, we have no idea.  Genetic drift is one explanation, but I suspect, given the profound differences between regions, that some form of selection is involved.  In WEIT I float the idea that sexual selection may be responsible: mate preferences for certain appearances differed among regions, leading to all those physical differences that distinguish groups.  But we have no evidence for this.  The advantage of this hypothesis is that sexual selection operates quickly, and could have differentiated populations in only 50,000 years or so, and it also operates largely on external appearance, explaining why the genes for morphology show much more differentiation among populations than random samples of microsatellite genes, whose function we don’t know.

Richard Dawkins on race:

“Race” is not a clearly defined word. “Species” is different. There really is an agreed way to decide whether two animals belong in the same species: can they interbreed? The interbreeding criterion gives the species a unique status in the hierarchy of taxonomic levels. Above the species level, a genus is just a group of species whose members are pretty similar to each other. No objective criterion exists to determine how similar they have to be, and the same is true of all the higher levels: family, order, class, phylum and the various “sub-” or “super-” names that intervene between them. Below the species level, “race” and “sub-species” are used interchangeably and, again, no objective criterion exists that would enable us to decide whether two people should be considered part of the same race or not, nor to decide how many races there are. And of course there is the added complication, absent above the species level, that races interbreed, so there are lots of people of mixed race.

The interbreeding criterion works pretty well, and it delivers an unequivocal verdict on humans and their supposed races. All living human races interbreed with one another. We are all members of the same species, and no reputable biologist would say any different. But let me call your attention to an interesting, perhaps even slightly disturbing, fact. While we happily interbreed with each other, producing a continuous spectrum of inter-races, we are reluctant to give up our divisive racial language. Wouldn’t you expect that if all intermediates are on constant display, the urge to classify people as one or the other of two extremes would wither away, smothered by the absurdity of the attempt, which is continually manifested everywhere we look? But this is not what happens, and perhaps that very fact is revealing.

It is genuinely true that, if you measure the total variation in the human species and then partition it into a between-race component and a within-race component, the between-race component is a very small fraction of the total. Only a small admixture of extra variation distinguishes races from each other. That is all correct. What is not correct is the inference that race is therefore a meaningless concept. This point has been clearly made by the distinguished Cambridge geneticist AWF Edwards in a recent paper called “Human genetic diversity: Lewontin’s fallacy.” RC Lewontin is an equally distinguished Cambridge (Mass) geneticist, known for the strength of his political convictions and his weakness for dragging them into science at every opportunity. Lewontin’s view of race has become near-universal orthodoxy in scientific circles. He wrote, in a famous paper of 1972: “It is clear that our perception of relatively large differences between human races and subgroups, as compared to the variation within these groups, is indeed a biased perception and that, based on randomly chosen genetic differences, human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals.”

This is exactly the point I accepted above, not surprisingly, since what I wrote was largely based on Lewontin. But see how Lewontin goes on: “Human racial classification is of no social value and is positively destructive of social and human relations. Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance.”

We can all happily agree that human racial classification is of no social value and is positively destructive of social and human relations. That is one reason why I object to ticking boxes in forms and why I object to positive discrimination in job selection. But that doesn’t mean that race is of “virtually no genetic or taxonomic significance.” This is Edwards’s point, and he reasons as follows. However small the racial partition of the total variation may be, if such racial characteristics as there are are highly correlated with other racial characteristics, they are by definition informative, and therefore of taxonomic significance.

“Informative” means something quite precise. An informative statement is one that tells you something you didn’t know before. The information content of a statement is measured as reduction in prior uncertainty. If I tell you that Evelyn is male, you immediately know a whole lot of things about him. Your prior uncertainty about the shape of his genitals is reduced. You now know facts you didn’t know before about his chromosomes, his hormones and other aspects of his biochemistry, and there is a quantitative reduction in your prior uncertainty about the depth of his voice, and the distribution of his facial hair and of his body fat and musculature. Contrary to Victorian prejudices, your prior uncertainty about Evelyn’s general intelligence, or ability to learn, remains unchanged by the news about his sex. Your prior uncertainty about his ability to lift weights or excel at most sports is quantitatively reduced, but only quantitatively. Plenty of females can beat plenty of males at any sport, although the best males can normally beat the best females. Your ability to bet on Evelyn’s running speed, say, or the power of his tennis serve, has been slightly raised by my telling you his sex, but it has not reached certainty.

Now to the question of race. If I tell you Suzy is Chinese, how much is your prior uncertainty reduced? You now are pretty certain that her hair is straight and black (or was black), that her eyes have an epicanthic fold, and one or two other things about her. If I tell you Colin is “black,” this does not, as we have seen, tell you he is black. Nevertheless, it is not uninformative. The high interobserver correlation suggests that there is a constellation of characteristics that most people recognise, such that the statement “Colin is black” really does reduce prior uncertainty about Colin. It works the other way around to some extent. If I tell you Carl is an Olympic sprinting champion, your prior uncertainty about his “race” is, as a matter of statistical fact, reduced. Indeed, you can have a fairly confident bet that he is “black.”

We got into this discussion through wondering whether the concept of race was, or had ever been, an information-rich way to classify people. How might we apply the criterion of interobserver correlation to judging the question? Well, suppose we took full-face photographs of 20 randomly chosen natives of each of the following countries: Japan, Uganda, Iceland, Sri Lanka, Papua New Guinea and Egypt. If we presented 120 people with all 120 photographs, my guess is that every single one of them would achieve 100 per cent success in sorting them into six different categories. What is more, if we told them the names of the six countries involved, all 120 subjects, if they were reasonably well educated, would correctly assign all 120 photographs to the correct countries. I haven’t done the experiment, but I am confident that you will agree with me on what the result would be. It may seem unscientific of me not to bother to do the experiment. But my confidence that you, being human, will agree without doing the experiment is the very point I am trying to make. If the experiment were to be done, I do not think Lewontin would expect any other result than the one I have predicted. Yet an opposite prediction would seem to follow from his statement that racial classification has virtually no taxonomic or genetic significance.

In short, I think Edwards is right and Lewontin wrong. Nevertheless, I strongly support Lewontin’s statement that racial classification can be actively destructive of social and human relations – especially when people use racial classification as a way of treating people differently, whether through negative or positive discrimination. To tie a racial label to somebody is informative in the sense that it tells you more than one thing about them. It might reduce your uncertainty about the colour of their hair, the colour of their skin, the straightness of their hair, the shape of their eye, the shape of their nose and how tall they are. But there is no reason to suppose that it tells you anything about how well qualified they are for a job. And even in the unlikely event that it did reduce your uncertainty about their likely suitability for some particular job, it would still be wicked to use racial labels as a basis for discrimination when hiring somebody. Choose on the basis of ability, and if, having done so, you end up with an all-black sprinting team, so be it. You have not practised racial discrimination in arriving at this conclusion.

Tishkoff and Kidd on race:

Some argue that there is no such thing as ‘race’ or that it is biologically meaningless. Yet the lay person will ridicule that position as nonsense, because people from different parts of the world look different, whereas people from the same part of the world tend to look similar. The popular concept of five races corresponds well to both geographic regions (Africa, Europe, East Asia, Oceania and the Americas) and bureaucratic definitions …. In this review, we focus on the biogeographical distribution of genetic variation and we address the question of whether or not populations cluster according to this popular concept of ‘race’. We show that racial classifications are inadequate descriptors of the distribution of genetic variation in our species.

Although the amount of genetic diversity between populations is relatively small compared with the amount of genetic diversity within populations, populations usually cluster by geographic region based on genetic distance (Fig. 4). Rosenberg et al. analyzed 377 microsatellites genotyped in 52 global populations using a clustering algorithm (STRUCTURE) to assign individuals to subgroups (clusters) that have distinctive allele frequencies. They could distinguish five main clusters of individuals that corresponded to broad geographic regions (Africa, Middle East and Europe, Asia, Oceania, Americas). They identified a sixth cluster specific to a Pakistani population, which probably reflects high levels of inbreeding and genetic drift in that group. Without reference to sampling location, individuals from the same predefined population nearly always shared membership in one of the five main clusters.

There were some exceptions, however, for populations from geographically intermediate regions (e.g., Central Asia, the Middle East), in which individuals had partial membership in multiple clusters, especially those of flanking geographic regions, indicating a continuous gradient of variation among some regions. Thus, although the main clusters correlate with the common concept of ‘races’ (as expected, because populations from different parts of the world have larger differences in allele frequencies than populations from the same region of the world), the analyses by STRUCTURE do not support discrete boundaries between races. Had there been a more geographically continuous sampling (e.g., from regions such as Ethiopia), there would probably be an even more continuous gradient of genetic variation across all geographic regions. This and other studies, indicate that one can assign individual ancestry to continent of origin with high accuracy using a large enough number of polymorphic markers (>60) and that “self-reported population ancestry likely provides a suitable proxy for genetic ancestry”. Accuracy of assigning ancestry can be even higher using ancestry-informative markers, markers with very different allele frequencies in populations from different regions, or functional variants that may be under differential selection. The accuracy of assigning ancestry decreases for populations from intermediate geographic regions such as Central Asia, the Middle East, Ethiopia or South Asia, and for individuals of mixed ancestry.
The emerging picture is that populations do, generally, cluster by broad geographic regions that correspond with common racial classification (Africa, Europe, Asia, Oceania, Americas). This is not surprising as the distribution of variation seen today is primarily the result of the history of human expansion out of Africa, the pathways of expansion through Eurasia, subsequent demographic expansions of populations into Oceania and the Americas and local and long-range migrations. A general pattern of isolation by distance has allowed drift to accumulate in spite of some damping due to local migrations. The pattern laid down by the initial expansion of modern humans out of Africa is detectable using Y-chromosome, mtDNA and autosomal markers. Selection in response to region-specific factors has enhanced the differences at some loci, and recent migrations and demic expansions have added complexity to the pattern. But ‘races’ are neither homogeneous nor distinct for most genetic variation.

Neil Reisch et al. on race:

In our view, much of this discussion [of race] does not derive from an objective scientific perspective. This is understandable, given both historic and current inequities based on perceived racial or ethnic identities, both in the US and around the world, and the resulting sensitivities in such debates. Nonetheless, we demonstrate here that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view.

Probably the best way to examine the issue of genetic subgrouping is through the lens of human evolution. If the human population mated at random, there would be no issue of genetic subgrouping because the chance of any individual carrying a specific gene variant would be evenly distributed around the world. For a variety of reasons, however, including geography, sociology and culture, humans have not and do not currently mate randomly, either on a global level or within countries such as the US. A clearer picture of human evolution has emerged from numerous studies over the past decade using a variety of genetic markers and involving indigenous populations from around the world. In summary, populations outside Africa derive from one or more migration events out of Africa within the last 100,000 years [5,6,7,8,9,10,11]. The greatest genetic variation occurs within Africans, with variation outside Africa representing either a subset of African diversity or newly arisen variants. Genetic differentiation between individuals depends on the degree and duration of separation of their ancestors. Geographic isolation and in-breeding (endogamy) due to social and/or cultural forces over extended time periods create and enhance genetic differentiation, while migration and inter-mating reduce it.

With this as background, it is not surprising that numerous human population genetic studies have come to the identical conclusion – that genetic differentiation is greatest when defined on a continental basis. The results are the same irrespective of the type of genetic markers employed, be they classical systems [5], restriction fragment length polymorphisms (RFLPs) [6], microsatellites [7,8,9,10,11], or single nucleotide polymorphisms (SNPs) [12]. For example, studying 14 indigenous populations from 5 continents with 30 microsatellite loci, Bowcock et al. [7] observed that the 14 populations clustered into the five continental groups, as depicted in Figure 1. The African branch included three sub-Saharan populations, CAR pygmies, Zaire pygmies, and the Lisongo; the Caucasian branch included Northern Europeans and Northern Italians; the Pacific Islander branch included Melanesians, New Guineans and Australians; the East Asian branch included Chinese, Japanese and Cambodians; and the Native American branch included Mayans from Mexico and the Surui and Karitiana from the Amazon basin. The identical diagram has since been derived by others, using a similar or greater number of microsatellite markers and individuals [8,9]. More recently, a survey of 3,899 SNPs in 313 genes based on US populations (Caucasians, African-Americans, Asians and Hispanics) once again provided distinct and non-overlapping clustering of the Caucasian, African-American and Asian samples [12]: “The results confirmed the integrity of the self-described ancestry of these individuals”. Hispanics, who represent a recently admixed group between Native American, Caucasian and African, did not form a distinct subgroup, but clustered variously with the other groups. A previous cluster analysis based on a much smaller number of SNPs led to a similar conclusion: “A tree relating 144 individuals from 12 human groups of Africa, Asia, Europe and Oceania, inferred from an average of 75 DNA polymorphisms/individual, is remarkable in that most individuals cluster with other members of their regional group” [13]. Effectively, these population genetic studies have recapitulated the classical definition of races based on continental ancestry – namely African, Caucasian (Europe and Middle East), Asian, Pacific Islander (for example, Australian, New Guinean and Melanesian), and Native American.

The terms race, ethnicity and ancestry are often used interchangeably, but some have also drawn distinctions. For the purpose of this article, we define racial groups on the basis of the primary continent of origin, as discussed above (with some modifications described below). Ethnicity is a self-defined construct that may be based on geographic, social, cultural and religious grounds. It has potential meaning from the genetic perspective, provided it defines an endogamous group that can be differentiated from other such groups. Ancestry refers to the race/ethnicity of an individual’s ancestors, whatever the individual’s current affiliation. From the genetic perspective, the important concept is mating patterns, and the degree to which racially or ethnically defined groups remain endogamous.

The continental definitions of race and ancestry need some modification, because it is clear that migrations have blurred the strict continental boundaries. For example, individuals currently living in South Africa, although currently Africans, have very different ancestry, race and ethnicity depending on the ancestry of their forbears (for example from Europe or Asia) and the degree to which they have remained endogamous. For our purposes here, on the basis of numerous population genetic surveys, we categorize Africans as those with primary ancestry in sub-Saharan Africa; this group includes African Americans and Afro-Caribbeans. Caucasians include those with ancestry in Europe and West Asia, including the Indian subcontinent and Middle East; North Africans typically also are included in this group as their ancestry derives largely from the Middle East rather than sub-Saharan Africa. ‘Asians’ are those from eastern Asia including China, Indochina, Japan, the Philippines and Siberia. By contrast, Pacific Islanders are those with indigenous ancestry from Australia, Papua New Guinea, Melanesia and Micronesia, as well as other Pacific Island groups further east. Native Americans are those that have indigenous ancestry in North and South America. Populations that exist at the boundaries of these continental divisions are sometimes the most difficult to categorize simply. For example, east African groups, such as Ethiopians and Somalis, have great genetic resemblance to Caucasians and are clearly intermediate between sub-Saharan Africans and Caucasians [5]. The existence of such intermediate groups should not, however, overshadow the fact that the greatest genetic structure that exists in the human population occurs at the racial level.

Most recently, Wilson et al. [2] studied 354 individuals from 8 populations deriving from Africa (Bantus, Afro-Caribbeans and Ethiopians), Europe/Mideast (Norwegians, Ashkenazi Jews and Armenians), Asia (Chinese) and Pacific Islands (Papua New Guineans). Their study was based on cluster analysis using 39 microsatellite loci. Consistent with previous studies, they obtained evidence of four clusters representing the major continental (racial) divisions described above as African, Caucasian, Asian, and Pacific Islander. The one population in their analysis that was seemingly not clearly classified on continental grounds was the Ethiopians, who clustered more into the Caucasian group. But it is known that African populations with close contact with Middle East populations, including Ethiopians and North Africans, have had significant admixture from Middle Eastern (Caucasian) groups, and are thus more closely related to Caucasians [14]. Furthermore, the analysis by Wilson et al. [2] did not detect subgroups within the four major racial clusters (for example, it did not separate the Norwegians, Ashkenazi Jews and Armenians among the Caucasian cluster), despite known genetic differences among them. The reason is clearly that these differences are not as great as those between races and are insufficient, with the amount of data provided, to distinguish these subgroups.

Illustrations of (genetic) population structure at the broad geographic (“race”) level

Tishkoff race PCA


Tishkoff - least squares clustering


European, African American, African populations PCA


Composition of (self-identified) 3 major (self-identified) racial/ethnic populations

Ternary plots of 3 major US racial/ethnic categories by continental ancestry

Composition of (self-identified) latino sub-populations

US hispanic subgroup plots

Probability of identifying as African American by percentage of african ancestry

Probability of reporting race black in US by percent sub-saharan african ancestry

Histogram of african ancestry amongst self-identified african-americans

Histogram of African ancestry amongst self-identified blacks

US race/ethnic group histogram by proportion African ancestry

Pct African by race/ethnicity

Note: logarithmic scale on the Y-axis.  Very few self-identifying white non-hispanics have significant levels of African ancestry

US race/ethnicity histogram by proportion American-Indian ancestry

US race/ethnicity histogram by proportion american indian

Note: Log Y-axis again.  Many latinos have substantial Indian ancestry, whereas very few self-identifying non-hispanic whites and blacks do.

Genetic ancestry by state amongst self-reported blacks

Genetic ancestry amongst self-reported blacks

Frequency of self-reported whites with >1 or >2 pct African ancestry

Google Chrome

Note: Most of the small number of self-identifying whites with even modest levels of african ancestry are (still) found in the south.

Share of self-identified race/ethnicity by proportion Native American ancestry (Y-axis) and proportion African ancestry (X-axis) 

Proportion of self-identified race/ethnicity by percentage african and percentage indian ancestry

Note: There are clear patterns here today.  Most people near 0,0 are “white”, most people with significant african ancestry are “black”, and most people with significant indian ancestry are “latino”.  There is less certainty at highly admixed levels, but the numbers are not that high and there is pattern to this (i.e., it’s correlated with admixture and it’s usually found amongst latinos)

Google Chrome (1)

Note: Aggregate african ancestry percentages amongst self-identifying whites is small everywhere ( less than 0.7% and usually much less), but does correlate with historically black/southern states. 

Google Chrome

Note: self-identifying blacks in most states have somewhere between 70% and 80% african ancestry.

source: for most of the above figures/charts, a study based on extensive 23andme DNA database

On the accuracy of predicting self-identified race/ethnicity (SIRE) from genetic clustering within US and Taiwan populations

Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies:

We have analyzed genetic data for 326 microsatellite markers that were typed uniformly in a large multiethnic population-based sample of individuals as part of a study of the genetics of hypertension (Family Blood Pressure Program). Subjects identified themselves as belonging to one of four major racial/ethnic groups (white, African American, East Asian, and Hispanic) and were recruited from 15 different geographic locales within the United States and Taiwan. Genetic cluster analysis of the microsatellite markers produced four major clusters, which showed near-perfect correspondence with the four self-reported race/ethnicity categories. Of 3,636 subjects of varying race/ethnicity, only 5 (0.14%) showed genetic cluster membership different from their self-identified race/ethnicity. On the other hand, we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity—as opposed to current residence—is the major determinant of genetic structure in the U.S. population. Implications of this genetic structure for case-control association studies are discussed.

SIRE vs Gene clustering

Compare genetic predictions of SIRE to other well accepted categories:

Scientists always disagree! A lot of the problem is terminology. I’m not even sure what race means, people use it in many different ways.

In our own studies, to avoid coming up with our own definition of race, we tend to use the definition others have employed, for example, the US census definition of race. There is also the concept of the major geographical structuring that exists in human populations—continental divisions—which has led to genetic differentiation. But if you expect absolute precision in any of these definitions, you can undermine any definitional system. Any category you come up with is going to be imperfect, but that doesn’t preclude you from using it or the fact that it has utility.

We talk about the prejudicial aspect of this. If you demand that kind of accuracy, then one could make the same arguments about sex and age!

You’ll like this. In a recent study, when we looked at the correlation between genetic structure [based on microsatellite markers] versus self-description, we found 99.9% concordance between the two. We actually had a higher discordance rate between self-reported sex and markers on the X chromosome! So you could argue that sex is also a problematic category. And there are differences between sex and gender; self-identification may not be correlated with biology perfectly. And there is sexism. And you can talk about age the same way. A person’s chronological age does not correspond perfectly with his biological age for a variety of reasons, both inherited and non-inherited. Perhaps just using someone’s actual birth year is not a very good way of measuring age. Does that mean we should throw it out? No. Also, there is ageism—prejudice related to age in our society. A lot of these arguments, which have a political or social aspect to them, can be made about all categories, not just the race/ethnicity one.

On the utility of using SIRE or continental genetic clustering to predict important risks/outcomes

Type II diabetes by percentage African-Admixture:

The odds ratio for diabetes comparing participants in the highest vs. lowest tertile of African ancestry was 1.33 (95% confidence interval 1.13-1.55), after adjustment for age, sex, study, body mass index (BMI), and SES. Admixture scans identified two potential loci for diabetes at 12p13.31 (LOD = 4.0) and 13q14.3 (Z score = 4.5, P = 6.6 × 10(-6)). In conclusion, genetic ancestry has a significant association with type 2 diabetes above and beyond its association with non-genetic risk factors for type 2 diabetes in African Americans, but no single gene with a major effect is sufficient to explain a large portion of the observed population difference in risk of diabetes.

Google Chrome

Obesity risks (& known genetic factors) in African Americans


Percent European Ancestry Was Inversely Associated with BMI among African Americans

The relationship between BMI and percentage of European ancestry is shown in Figure 1. BMI was inversely correlated with European ancestry as estimated from autosomes, an effect that was weak (ρ = −0.042) but statistically significant (P = 1.6×10−7) given the large sample size. It was also significantly correlated with European ancestry as estimated from the X chromosome (ρ = −0.046, P = 1.2×10−8).

Asthma Risks by African-Ancestry

After filtering for self-reported ancestry and genotype data quality, samples from 1,117 self-reported African-American individuals from New York and Baltimore (394 cases, 481 controls), and Chicago (321 cases followed for asthma exacerbations) were analyzed. Genetic ancestry was estimated based on ancestry informative markers (AIMs) selected for being highly divergent among European and West African populations (95 AIMs for New York and Baltimore, and 66 independent AIMs for Chicago). Among case-control samples, the mean African ancestry was significantly higher in asthmatics than in non-asthmatics (82.0±14.0% vs. 77.8±18.1%, mean difference 4.2% [95% confidence interval (CI):2.0–6.4], p<0.0001). This association remained significant after adjusting for potential confounders (odds ratio: 4.55, 95% CI: 1.69–12.29, p = 0.003). African ancestry failed to show an association with asthma exacerbations (p = 0.965) using a model based on longitudinal data of the number of exacerbations followed over 1.5 years.

Asthma Odds Ratio

BMI and wait-to-hip ratio in women of different ethnic groups by admixture

There was a significant positive association between body mass index (BMI) and African admixture when BMI was considered as a continuous variable, and age, education, physical activity, parity, family income and smoking were included covariates (p < 10− 4). A dichotomous model (upper and lower BMI quartiles) showed that African admixture was associated with a high odds ratio [OR = 3.27 (for 100% admixture compared to 0% admixture), 95% confidence interval (CI) 2.08 – 5.15]. For HA there was no association between BMI and admixture. In contrast, when waist to hip ratio (WHR) was used as a measure of adipose distribution, there was no significant association between WHR and admixture in AFA but there was a strong association in HA (p<10− 4; OR Amerindian admixture = 5.93, CI = 3.52 – 9.97).


There are huge differences by race/ethnicity across almost every disease in the United States.



It is quite likely that many of these other risks are also substantially the result of different genetic risk factors (both above and below a non-hispanic white baseline).

There are other observable differences in phenotypes besides skin color and other presumably purely superficial traits

Although white and black adults in the United States have the same average stature, when education, income and other variables are controlled, the body proportions of the two groups are different. Krogman  found that for the same height, blacks living in Philadelphia, USA had shorter trunks and longer extremities than whites, especially the lower leg and forearm. Hamill et al. found that this was also true for a national sample of black and white youths 12 to 17 years old, and it is the case for adults 20−49 years old measured for the NHANES III survey, 1988–1994 . A genomic contribution to the body proportion differences between blacks and whites seems likely, as the blacks tend to have more sub-Sahara African genomic origins than the whites.

Race height limb ratios

Bone density by admixture

This should not be surprising because evolution never stopped and we adapted to different environments

There is substantial evidence that human evolution actually accelerated and that it did so in ways that relate to different population size and technological/agricultural advancements.

Fig. 1.

Fig. 2.

These relatively recent changes can very significant

Some African Pygmy groups are fully 6 standard deviations below average height today and this is surely mostly genetic

Pygmy height admixture

Height is believed to be a complex trait with thousands of genes involved, thus it is not unreasonable to propose that other complex traits may be influenced the environment.  However, just because some small and mostly isolated hunter-gatherer groups on a continent known for having large genetic diversity does not imply that we should expect to find the same genes in the larger subsaharan African populations (e.g., Bantu), even less so the African-American populations with known African ancestry composition.

Pygmy STRUCTURE Generally speaking the race/continental populations tend to cluster closer together, implying similar, even if not quite homogeneous, genotypes, because of gene flow and the fact that most traits aren’t as heavily selected for as the Pygmy traits are believed to be (presumed to protect against pathogens or something along those lines).

STRUCTURE analysis with Africa focus

Continental ancestry/racial groups are not homogeneous, but that does not mean that there are not significant statistical relationships along “racial” lines that apply to the vast majority of individuals alive today

We know, for instance, that there is population structure according to ancestral European heritage.  If we run a bunch of European DNA data into PCA algorithms you can find patterns that correspond closely to distance and geographic orientation.  It’s even possible, in some cases, to identify the ancestral area within a few kilometers.

Europe PCA

These differences, however, are usually tiny by comparison to groups of different continental ancestry.  If you analyze a more diverse population these differences fade out rapidly.

Like India vs other reference populations (including Europeans)

Google Chrome

Sometimes we can observe geographic patterns in phenotypes.  For instance, Northern Europeans tend to be taller than Southern Europeans, a pattern which appears to be largely genetic in this day and age of ample nutrition, modern medicine, etc (minimizing the degree to which modern environmental variances are apt to matter)

Northern vs Southern Europe height

We can even identify probable reasons behind this that are still operating today.

Nevertheless, it is not nonsensical to talk about average European height given what we know about these genetic relationships, history, etc.  Some sub-groups are likely genetically taller than others and we may be able to make more accurate predications using detailed ancestry in some cases (e.g., Dutch vs Southern Italians), but that does not imply that randomly drawing 100 20 year old males in the European population at random aren’t apt to be taller than 100 from the broader East Asian population (including those living in the west).

Even if the continental/racial boundaries were drawn poorly (which they are not), that does not imply that there cannot be real statistical differences in the sets

For illustration purposes:

Country Sets example

My crudely drawn sets are intentionally silly and contain heterogenous “stuff”, but nevertheless we would expect to find proportionally more, say, english speakers in area contained in dumb set A than dumb set B.  Just because the “taxonomy” here is dumb or “socially defined” doesn’t imply that, thusly defined, the subsets we care about are actually exactly equivalent in any particular respect (even if most of these are subtle)

It is pretty apparent that the taxonomy complaint has much to do with normative objections that have little to nothing to do with the taxonomy per se.  We might change or redraw these groupings somewhat or call it something besides “race”, but if the underlying issue that the relevant subpopulations A and B are different in some quantitive respect it doesn’t really do anything to change the labeling here.  In other words, even if we dispense with the dumb sets entirely or somehow prove that the Brazil really belongs in set B…. the fact still remains that the US is not identical to Brazil.  People can still argue that the US has a different latitude than Brazil [even if people object to characterizing these continents on the grounds that these coordinates are “arbitrary”, “value-laden”, etc etc that doesn’t imply that they’re not referencing something of some sort consequence to someone]

Besides the utility of race in medicine, it’s worth bearing in mind that track a lot of data by this “socially defined” race/ethnicity.   So even if there were arguably better ways to do it, I do not think it unreasonable to use the same unit of analysis to account for genetic AND environmental causes that underlie the observed differences in outcomes.  Until or unless we stop fretting about these sorts of disparities in outcomes or, at least, move them to a more granular unit of analysis socially, we should be very interested in analyzing genetic patterns that occur within these same socially defined groups.

Small systematic differences can be socially important


Group A and Group B both overlap here significantly in the above illustration.  They have identical standard deviations and their means differ by just 10 points (1/3 standard deviations).    We can argue that there is more difference within the groups than between the groups and that these sorts of statistical differences aren’t perceptible to the average human.  We might also argue that they generally do not usually matter.  Nevertheless, if we look at the odds ratio of B as compared to A (the green line), we can clearly see that at higher threshold values (e.g., 320) there are marked differences between the groups.  These differing probabilities can be “good” (e.g., success in elite sprinting), “bad” (e.g., cardiac-heart disease) or even merely purely a matter of taste/normative values, but either way these sorts of things can be predicted with some accuracy and these sorts of differences may be caused primarily by the fact group A has slightly higher frequency of particular alleles as compared to group B.

Hypothesizing genetic causes for persistent statistical differences in phenotype or related outcomes is not inherently unreasonable.  In some cases they are essentially already proven to be substantially genetic (e.g., type II diabetes risk).  There is no good reason to presume that ancestral populations which evolved in significantly different environments and that we know to be substantially genetically differentiated are functionally identical in all things.  Yet strangely we continue to labor under the assumption that any disparity, good or bad, no matter how rare, must be caused by something in the environment.  Why is that?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s