Exploring ELS 2002 data

The NYTimes turned me onto a new data source in a recent article on college graduation rates by SES.  They implied that college graduation rates are better predicted by “wealth” than by the students test scores (10th grade ELS scores taken in 2002).

Google Chrome

Being both curious about the underlying data and somewhat skeptical of the particular claims (or, at least, its interpretation) I decided to investigate it for myself.  Having done so now, I can tell you that it’s a pretty rich data set.  Unfortunately, a few key data points (e.g., SAT scores, HS GPA, etc) are censored or rounded/binned to protect anonymity, but there are still a lot of interesting data there to analyze.


 

Update (6/6/15):

As in my follow-up post on economic mobility, I realized that they actually provided 9th-12th grade high school GPA as a non-continuous variable in the publicly accessible file.  I have updated my post to reflect this new information in a few places!


First point

The parent’s educational attainment is a much better predictor of both test scores and subsequent child educational attainment than economic measures…..

 (Bachelor degree) Attainment rate by test scores, grouped by parent income levels

scatter_with_errorbars_by_income_levels

Attainment rate by test scores, grouped by SES levels

scatter_with_errorbars_by_SES_bins

Attainment rate by test scores, grouped by average parents years of education in 2yr increments

scatter_with_errorbars_by_parents_avg_years_ed

Attainment rate by test scores, grouped by parents’ highest education level (e.g., HS dropout – PhD/MD/JD)

scatter_with_errorbars_by_parents_highest_education_level

Attainment rate by test scores, grouped by average combined years education of mother and father

comparison_of_college_rate_by_test_score

Something I noticed in a prior analysis of NAEP scores is that the parents’ education level appears to have an additive effect on test scores  and that the sex of the better educated parent seemed to have no (consistent) systematic effect.  In other words, the parents’ highest educational attainment wasn’t as good of a predictor as the averaged educational attainment of both parents.

I was able to reproduce this in the data set (note: the n is small with some of the more unlikely pairings, e.g., Phd w/ HS dropout… I should probably add confidence intervals here, but oh well!).

Microsoft Excel (1)

Microsoft Excel


Second point

The cited SES measure includes many other correlates besides just income.

NLS-72/HS&B/NELS:88-comparable composite variable constructed from parent questionnaire data when available and student substitutions when not. SES is based on five equally weighted, standardized components: fathers/guardians education (F1FATHED), mothers/guardians education (F1MOTHED), family income (BYINCOME), fathers/guardians occupation (F1OCCUF), and mothers/guardians occupation (F1OCCUM). Each of these five composite variables that served as inputs to SES1 and SES2 were imputed if missing. See those variable descriptions for further detail. An occupation prestige value was determined for F1OCCUM and F1OCCUF.

The 1961 Duncan index was used for determining the occupation prestige values for the SES1 variable. The parent questionnaire included parent occupation codes determined by the respondents themselves. If the occupation texts were left uncoded or if the student supplied the parent occupation text, project staff subsequently coded those occupations if possible. Missing occupations were imputed. Parent education was based on parent report, or student report if parent data missing, imputed otherwise. Income was based on parent questionnaire information, imputed otherwise.

In other words, (1) income is only ~20% of their SES measure (2) wealth is not included and (3) the composite score is heavily weighted to towards parents education levels (through education directly and occupational prestige).   This is a good thing as far as predicting IQ, educational attainment, etc are concerned, but we ought to be clear that it’s not just or even mainly about economics (despite the correlation).

Income alone isn’t that strong a predictor of test scores:

white_comp_boxplot_by_income

It gets considerably better with these other factors included.

white_comp_boxplot_by_ses025_units

Parents highest education level (single) is reasonably good.

white_comp_boxplot_by_parents_highest_education_level

But, as I mentioned above, we can do quite a bit better by averaging both parents education levels together into a composite.

white_comp_boxplot_by_average_parents_education

Of course these educational credential categories are fairly crude and the data is not continuous, so the SES index probably does a bit better job incorporating some of this other data (e.g., occupational prestige is apt to further sort individuals by cognitive and other “social capital” factors)..  Nonetheless we can clearly get pretty close to it with this alone today (perhaps less so in the future with credential inflation?).


Third point

There clearly are systematic differences in attainment that cannot be explained by these individual test scores alone.

White attainment rate by test scores, grouped by test score levels and income levels

bs_attainment_scatter_by_income_levels_white_only

White attainment rate by test scores, grouped by (3) income levels and test scoresscatter_by_binned_income

However, these differences tend to narrow at the highest 10th grade test scores and they are mostly explained by differences in actual initial enrollment (or lack thereof).  The actual graduation rate differences (i.e., excluding HS drop outs and people that didn’t try for anything beyond an associates degree) are fairly modest, especially at around 1 SD above the national average (national mean for all races/ethnic groups is 50 with an SD of 10, so north of 60….).   It’s also worth pointing out that even top scorers of the highest income levels (200K+) graduate at something significantly less than 100%.

White graduation rate by test scores, group by (binned) test score and income levels

graduation_rate_by_income

White graduation rate by test scores, grouped by (3) income levels and test scoresgraduation_rate_by_income_bin


Fourth point

The differences by parental education level are larger than the income differences.

White attainment rate by test score (grouped by average of parents years of education)

graduation_rate_by_avg_ed_detail

White graduation rate by test scores, grouped by parents education levels (binned in 2 year increments)
scatter_by_binned_average_ed_level

White attainment rate by test score, grouped by test scores and parental education levels (avg years)
attainment_rate_by_binned_ed_level


 

Fifth point

There are clear differences in educational attainment rates by race.    Asians at essentially all points in the score distribution attend college at higher rates than whites (and especially latinos).   I suspect this is being driven by the fact that asians place a very high priority on educational attainment and that they are apt to post higher grades than you’d expect based on their test scores alone due to extra effort (in large part), which gives them better admissions probabilities and makes them less likely to fail out due to poor preparation, incomplete assignments, etc (too bad they don’t expose GPA or SAT scores in the public data!).

Attainment rate by test scores, grouped by race and test scores

attainment_rate_by_race

Graduation rate by test scores, grouped by parents education levels and test scoresgraduation_rate_by_race

This tends to support that notion that there are other important differences driving these within-group SES differences than purely economic considerations.   For instance, even holding these test scores constant, high SES whites may be more likely to post better grades and place a higher average priority on getting into good baccalaureate programs than low SES whites, independent of economic considerations per se, for reasons that have to do with genetics, social capital, cultural differences, and the like.   I would bet that the children of bright but relatively low paid academics do much “better” on these sorts of measures than their income or “wealth” would suggest.  (no, I don’t think it’s entirely genetic, but just because it’s not entirely genetic doesn’t mean that something that we can plausibly fix it through policy like straight forward economic concerns)


 

Sixth point (new: using HS GPA data ….)

We can predict bachelor degree attainment rates quite well using (1) ELS:2002 composite test scores  (2) HS GPA groups (3) parental education levels.

 

bach_attainment_rate_by_model_with_hsgpa_3pincome

 

 

attainment_pred_by_race

 

attainment_pred_whites_by_sex

 

attainment_pred_blacks_by_sex

 

 

 

To show what I mean be “over” or “under” achievement, I produced some histograms of HS GPA by test scores for different groups (by race, by sex, by parental education levels).  I restricted this to a bit less than 0.5 SD below average and above (roughly broadly “college ready” territory) to save space and binned these in 4 point increments (about 0.4 SD, national mean being about 50).

hsgpa_faceted_by_parent_education

 

hsgpa_histogram_by_race

hsgpa_hist_by_sex

Groups that clearly skew towards higher or lower GPA levels than predicted (e.g., asians, females, students of higher educated parents) are only crudely adjusted using HSGPA in the model since it’s not a continuous variable and there’s information lost within those broad bins (!!)

 

HS GPA by test score, grouped by parental income levels

hsgpa_scatter_by_income_groups

HS GPA by test score,grouped by highest parental education levels

hsgpa_scatter_by_phed

HS GPA by test score, grouped by sex

hsgpa_scatter_for_sex

HS GPA by test score, grouped by race/ethnicity

hsgpa_scatter_for_race

 

[Note: These give a decent sense for the expected HS GPA for each group at a given test score for most of the distribution.  However, due to range restriction and the fact that it’s not continuous, it’s probably not a great reflection of reality as it approaches either end of the HS GPA axis.]

 


Some other, somewhat unrelated, observations

1: Asians/Pacific Islanders, curiously, don’t outperform whites with the composite score (math + reading) unlike on the SATs, California/STAR tests, and numerous other measures.   The W-B gap about what is to be expected, i.e., 0.96 SD with the sampled groups.  (note: I made no use of their population weights so these numbers could be skewed somewhat by over/under sampling of particular subgroups).  According to the technical writeup on ELS:2002  it was designed to be more of a practical out-of-classroom skills test, more akin to the PISA, so this probably skews it more in the direction of IQ tests than more classroom material loaded achievement tests.

race/ethnic mean standard deviation
hispanic 45.5 9.6
asian 52.2 10.2
black 44.5 8.5
multi_racial 50.6 9.6
white 53.3 9.2
american_indian 45.6 7.9

beanplot_female_composite_by_race beanplot_male_composite_by_race

2: Box plots of scores by various correlates

Scores by race and parents’ income level

barchart_race_by_income_level

Scores by race and parents’ education level

boxplot_race_comp_by_education_level

 Scores by race and parents SES (binned)

boxplot_race_comp_score_by_ses_binned

 Scores by race and parents SES (continuous measure)

scatter_composite_scores_by_ses_continuous

SES distribution by race (by their calculations)

ses_distribution_by_race


 

Some bits relating to economic mobility

2010 white male earnings by 2002 (10th grade) scores (not conditioned in academics or work experience)

white_male_earnings_no_conditions

 2010 white male earnings by parents SES (not conditioned in academics or work experience)

white_male_earnings_by_parents_SES

There is clearly a correlation with both measures, but neither are particularly well correlated in this (young) age group.  That said, even here you can clearly see that test scores are a better predictor than parents income (or even broader measures of SES, which, in turn, are better correlated with education than income….)….

splom_white_males

splom_asian_males



splom_hispanic_males splom_black_males

splom_all_males

splom_all_females

[Added in two additional columns for white males, earnings contingent on full-time work and bach. attainment contingent on attempting it, i.e., graduation rates]

splom_white_male_conditional

splom_white_females_with_conditionals


Box plots for white male earnings by various measures

 By parents income level

boxplot_earnings_vs_parents_income_level

 By parents socio-economic status

boxplot_earnings_vs_parents_ses_05_bin

 By parents highest education level

boxplot_earnings_vs_parents_highest_education_level

 By 10th grade composite test score

boxplot_earnings_vs_test_score_binned

Advertisements

One thought on “Exploring ELS 2002 data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s