A (very) brief exploratory analysis of NLSY97 height data

The NLSY97 data provides height information for both biological parents and students at yearly intervals, so I thought I’d take a look at that data.

parent_height_avg_squares

It looks very much like heritability increases with age here (unsurprisingly).  Around age 13 the slope is relatively flat, especially for boys, but it slowly increases and by age 21 (give or take) there is a rather stark contrast.

nhw_height_by_age

male_height_by_re

correlation_by_age

(Note: I’m not filtering out for people that reported in one round but not in another, so some volatility is to be expected.  Not to mention that some people may choose to round up or down somewhat randomly 🙂

If we were to compare the percentile ranks of students, especially boys, at age 14 to age 20, it’s a sure bet that the children with taller parents will, on average, be taller than their rankings at age 14 might lead you to believe, i.e., not only will there be some ordinal reshuffling, but that reshuffling will tend to be somewhat systematic with respect to parental height.   Although height and IQ are obviously different phenotypes, we do know that IQ heritability also increases with age:

This has some relevance for the observed growth in achievement gaps between earlier test scores and later test scores (even though the measures tend to be strongly correlated).

(This effect may not be entirely genetic, but it’s apt to be a substantial part of it)


Below are some plots of male height by parents’ average height grouped by the parents’ 1997 income tercile

nhw_height_by_parents_grouped_by_pincome3

male_height_by_parents_grouped_by_pincome3

(Err, that is 1997 income, not 2007!)


Below I setup a linear model to predict (juvenile) height at each age using (1) age in years as categorical (non-linear growth with age)  & sex interaction (different growth trajectories by sex) (2) mother and father height in inches (3) race/ethnicity (some groups are a fraction of an inch taller/shorter than you’d expect) to try to suss out the independent income effects.

all_pred_height_with_interaction

Or just at age 21 if you don’t feel like scrolling….

pred_height_age_21

height_red_by_gender

overpred_by_parents_income

(s/2007/1997/)

Parent income appears to have no independent (causative) effect on adult height (or at least nothing that is even close to being statistically significant for the vast majority of the distribution once we account for their parents heights and race/ethnicity).   The children of lower income parents grow up to be significantly shorter on average, but those relative differences are almost perfectly aligned with relative differences in their parental heights.  Unless you believe there is a nearly perfect correlation with adult income and height (there isn’t), it’s very difficult to credit the environmental explanation for systematic differences in adult height between income groups in the US in recent decades.


Much the same holds for other markers of SES, like parental education levels (which are even better correlated with adult height than income).

pred_by_ed_b4

pred_height_by_ped_b4_age22to24

overpred_by_parents_education_unrestricted

overpred_by_parents_education_by_ph_quintiles

There is no evidence of a general trend in the vast majority of the parental height distribution for the children of better educated parents to be taller than you’d predict based on the parents height (and, at the margins, race/ethnicity), as you might expect if you thought that, say, better educated parents tend to provide more nurturing environments, which should in theory produce marginally taller children.  There is arguably some difference at the tails, but that’s mostly not statistically significant.

That being said, there might be a slight regression to the mean effect going on in the right tail, i.e., amongst those with very tall parents the high SES people are more likely to have genotypical taller parents (there clearly are differences in the means and it’s clearly mostly genetic) as opposed to parents that were much taller than average due relatively more to the random effects of various stochastic processes.


Interestingly,  asians/pacific islanders (as an overly broad group) and hispanics don’t seem to gain appreciably more ground over their parents than whites do in this data set (at least not when viewed by parent height).    Asians and hispanics are on average significantly shorter in the US and it certainly doesn’t look like this cohort are any taller than you’d expect based on mid-parent height and overall growth trends (including whites).   This rather tends to contradict the ideas proffered by some that dietary differences are responsible for a most or all of the observed group differences in height in the first world countries.

male_height_by_re_key

female_height_by_mom_height male_height_by_fathers_height

That being said, it is pretty clear that all major groups in this cohort are significantly taller than their parents on average.

race_eth_parent_height_delta_both_sexes

gender_parent_height_delta_by_raceeth

Asians and (white) hispanics seem to have the largest gains (although arguably not statistically significantly).  This may seem to contradict my earlier statement and scatter plot/regression lines, but keep in mind that most of them are on the left side of the parent height distribution where the average gains are larger for all groups.  Relative to children of non-hispanic white parents of similar heights their gains seem to be somewhat smaller (and almost certainly not larger), but averaged over all heights this skews their average growth upward.    So there might be some catch up at the very bottom, but it’s of a similar (or lesser) magnitude of white children of parents of similar height (which still won’t put them on par height wise).

My view is that the vast majority of people in this country born in the last 30 years are fairly close to their maximum potential height due to abundant nutrition, minimal childhood disease, etc (even most of our relatively low-income groups, see above), so most systematic differences in height between groups are likely mostly genetic.  Growth has leveled off in other highly developed western countries as well (including the Netherlands).


height_by_IQ

As reported elsewhere, height and IQ are positively correlated.  It’s about 0.1 for non-hispanic whites (a bit more for females than males) at age 21 (height).  These are juvenile IQ scores too, my guess is the adult score correlation would be a bit stronger.  However, it’s also correlated with the parents education levels and parents income.  (I haven’t paid much attention to the literature on that topic and I’m not sure if this is supposed to be some kind of population structure, assortive mating, or something else, like pleiotropy….) 

height_by_parents_avg_year_education height_by_parents_income

nhw_male_height_by_med nhw_height_by_fed

male_height_by_ped_b3 nhw_male_height_by_ped_b4male_height_by_parent_income_decile(Note: This plot is sorted by average height and includes all racial/ethnic groups, so it’s not as perfectly sorted as it might appear at first glance within income alone.  With larger n it probably would be though)

male_height_by_raceeth

female_height_by_raceeth

mean_height_by_asvab_primary_ethnicity

Although I don’t put a lot of stock in self-reported european ethnicities within the US (for most multi-generational people at least) and the N here is rather small, here is the data as reported by the students on the ASVAB tests (I’ll compare this later to the other reported ethnicities).

femaleh_asvab_eth1 femaleh_asvab_eth2 maleh_asvab_eth2 maleh_asvab_eth1

Advertisements

5 thoughts on “A (very) brief exploratory analysis of NLSY97 height data

  1. ggplot2 ❤

    You probably shouldn't be doing all those subgroup analyses with ethnorace. Samples sizes too small for looking at the small effects you are looking at. (Even if facets are neat.)

    • I also think self-reported ethnicity data in the often troublesome for most white americans since most of us are a mix of a bunch of different europeans groups and often hazy about the details, i.e., the #1 or #2 reported group is weakly correlated with the actual genetic aspects of it, especially outside of broad groupings or special cases like N Euro vs S. Euro vs Ashkenazi.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s