Predicting health care expenditures in OECD data with non-linear model specification

In my prior post, wherein I argued at length that US health care expenditures are reasonably well explained by Actual Individual Consumption (AIC) and that GDP is an inferior predictor, I pointed out toward the end that the linear specification I used is likely to significantly overstate US residuals because there is good evidence for non-linearity and because the US is far out on the frontier vis-a-vis consumption.

This non-linearity can be seen pretty clearly if you look at the 2011 data derived from the World Bank (for AIC) and WHO (for HCE).

In per capita terms
In percentage terms

Since some people may (1) doubt the accuracy of these statistics outside of the few highly developed countries (2) imagine that these poor countries are somehow qualitatively different in a way that’s not well correlated with their level of economic development or (3) are particularly reluctant to accept non-linearity as a potential partial explanation for the US here, I thought I’d approach this from a somewhat different angle.

Read More »

High US health care spending is quite well explained by its high material standard of living

About two years ago I created a long blog post arguing that the United States is not an outlier in healthcare expenditures per capita.   Following renewed interest from a link from Marginal Revolution recently and some criticism from a few people on various comment threads, I thought I’d take the time to update the evidence, address some areas of criticism, and muster yet more lines of evidence to support my argument.   This post should largely make the earlier post obsolete, but I will keep the earlier post up for posterity and to retain data/information that won’t necessarily be perfectly duplicated in this post.

There exist several popular plots like these that people use to make the argument that the United States spends vastly more than it should for its level of wealth.





These plots and the arguments that usually go with them give the strong impression that US spends about twice as much as it should.  However, these are misleading for several reasons, namely:

  1. GDP is a substantially weaker proxy for “wealth” and a substantially weaker predictor of health care expenditures than other available measures.
  2. The US is much wealthier than other countries in these plots in reality.
  3. The arbitrary selection of a handful of countries tends to hide the problems with GDP in this context and, oddly enough, simultaneously downplay the strength of the relationship between wealth and health care spending
  4. Comparing these two quantities with a linear scale tends to substantially overstate the apparent magnitude of the residuals from trend amongst the richer economies when what we’re implicitly concerned with is the percentage spent on healthcare.

When properly analyzed with better data and closer attention to detail, it becomes quite clear that US healthcare spending is not astronomically high for a country of its wealth.  Below I will layout these arguments in much greater detail and provide data, plots, and some statistical analysis to prove my point.

Read More »

My response to the NYTimes article on school districts, test scores, and income.

On April 29th the New York Times posted a nominally data driven article on school districts, test scores, and socioeconomic status.   Though it contained some useful data, the analysis was terribly misleading and it excluded a tremendous amount of pertinent information.  Many progressives took the article as proof that the system is “rigged”.


The NYTimes did not help matters by conflating the measures of socioeconomic status (SES) with income.  Although every one of their plots used a composite SES measure on the x-axis, the article itself and various annotations give a strong impression that money/income/wealth are the primary drivers of this:


The words income, economic, wealth, money, rich, poor, and other related words were littered liberally throughout the  article.  Not a single mention was made of other predictors or even of the composition of the SES index they used, save for an easy to miss footnote at the end of the article.

The SES measure they used was defined in the SEDA archive as:

the first principal component factor score of the following measures: median income, percent with a bachelor’s degree or higher, poverty rate, SNAP rate, single mother headed household rate, and unemployment rate

[emphasis mine]

These non-economic dimensions actually exert significant influence on the correlations, are not directly tied to income/wealth/etc, and show marked racial/ethnic differences even at the same income level (e.g., single motherhood rates are much higher in the black community at any given level of income)

Rather than focus too much on what is wrong with this specific article (this sort of article is practically a genre unto itself these days), I will instead systematically address misperceptions here and attempt to shed more light on the nature and underlying causes of these patterns.   I will argue that (1) these gaps are mostly genetic (2) they generally have little to do with systematic differences in parental economics (3) they have even less to do with the school systems themselves (4) these patterns are not unique to the US.

I will use some data from the Stanford Education Data Archive (SEDA),  the same used by the NYTimes article, to help make some of my points but, unlike some of my other blog posts, I will try to cover each point in just enough depth to convey the gist of it (linking out for more in-depth analysis for those that are interested in the particular point).  I will also bring together good deal of supporting evidence that is buried in different academic articles, government databases, think-tank research, and so on.

Read More »

No, the SAT doesn’t just “measure income”

I started this post to refute some specific arguments, but I changed my mind midstream and decided to add a lot more material than I initially envisioned. This is best viewed as being akin to a FAQ (Frequently Refuted Objections – FRO?) relating to the standardized tests and their use in higher education.

SAT and income are not perfectly correlated

The SAT is certainly modestly correlated with parental income, but it is simply not true that the SAT is nothing more than a measure of family income.

I will briefly plot the 2011 SAT reading scores by income level to illustrate that the r**2 is considerably less than one.

There is significant overlap across the entire income distribution:

Box and whisker plot of simulated test scores
Simulated distribution by actual test taker counts

Read More »

On the relationship between negative home owner equity and racial demographics

There are a large and growing number of popular media articles alleging racial discrimination in the mortgage market.   It is simply assumed that if lenders are less willing to extend credit to blacks or make loans in “black neighborhoods” as often or with similarly favorable terms as they do whites or “white neighborhoods” that this apt to be explained by explicit racism or (subconscious) bias.  These naive arguments persist despite tremendous evidence that there are observable and unobservable differences that have profound effects on credit risk.

I will briefly describe some of this evidence before making my own modest contribution using data from and the US census. You can click here if you are familiar with this literature already and wish to skip ahead to my analysis.

1- Blacks have much lower credit scores (e.g., FICO) 

Google Chrome.png


The difference in between the white and black means is about 1 standard deviation.

Read More »

Racial differences in homicide rates are poorly explained by economics

There are large racial differences in the homicide rates in the United States.  The FBI and other government organizations are not always forthcoming with detailed data, but you can quite readily estimate it (approximately) with the victimization/mortality data from the CDC and other sources (most crimes being committed intra-racially)


The black homicide rate is about 10 times higher than the white rate. It has been this way for quite some time (i.e., even as the rates have changed the differences themselves have remained fairly stable).

Google Chrome

Similar patterns can be found elsewhere, but I find the homicide statistic useful and interesting for many different reasons, namely:

Read More »

On a recent Brookings blog post hyping economic inequality as a predictor of well-being

I recently came across a Brookings Institution blog post regarding the relationship between economic inequality and various measures of well-being.

Setting aside the lack of statistical controls and probable confounds, the practical significance implied by their own plots was actually rather trifling.

Read More »

Small update to prior post on school suspension rates, comparing racial/ethnic differences with national aggregates

In my last post I displayed a plot showing a striking correlation for single-motherhood rates and out-of-school suspension rates between racial/ethnic groups using national averages.


I am well aware that aggregating linearly correlated variables will tend to produce (much) stronger correlations than you’d see with more granular data (e.g., state, county, family, individual, etc).  On the other hand, I am familiar enough with these statistics to know that you will see substantially weaker correlations here with other common predictors.  Hispanics/latinos, for instance, tend to be worse off than than blacks by many economic measures, rarely appreciably better off, and yet their discipline problems are much less (even, interestingly, less than whites in California controlling for median family income).  Likewise, the distance between asians and non-hispanic whites tends to be modest on economic dimensions, but their suspension rates are roughly half the non-hispanic white average.

For the benefit of others, I decided to generate some plots of predictors aggregated at a national level for comparison’s sake (note: I reversed the x-axis to keep the graphic relationship the same where necessary).

Read More »

On the relationship between school suspensions, race, single-motherhood, and more.

As a follow up to my prior post on single-motherhood and mobility and in response to various assertions of discrimination against blacks in the school system, I decided to take a data-driven look into the relationship between race and school suspension rates.

There is, of course, ample evidence that discipline rates vary dramatically between racial/ethnic groups.

Microsoft Excel

Blacks get suspended at vastly disproportionate rates whereas “asians” (census/OMB definition), on the other hand, are about half as likely as whites are to get suspended.  Contrary to conventional wisdom, though, this pattern tends to be pretty consistent nation wide and the south is not notably “worse” with respect to disparities here.

Read More »

On Philip Cohen’s knee-jerk response to Chetty’s “causal mobility” data and its association with single-motherhood

Philip Cohen, a sociologist that blogs at Family Inequality, recently argued, in response to the proposition that single-motherhood is strongly associated with economic mobility, that the single-motherhood effect is “entirely in the % black effect”.

While I do not necessarily disagree with the notion that racial demographics are strong predictors (albeit probably for different reasons than he does) and I do not necessarily believe that the single-motherhood association is (mostly) causal, his strong language is clearly at odds with the data.  In fact, his statements are not even well supported by his own stats.

Read More »