How accurate are BLS forecasts?

5/08/2015 01:43:00 PM
Tweetable
As happens the first friday every month, the BLS released new jobs data sending twitter abuzz. Most news outlets reported that according to the BLS's initial estimate, the economy added a net 223,000 jobs in April compared to March this year. 223,000 is not a lot, but that bad either. And most journalists will leave it at that, never following up.

But skilled journalists understand that these numbers are soft, and report not just the headline estimate, but also revisions to previous months' estimates. From the BLS press release:
"The change in total nonfarm payroll employment for February was revised from +264,000 to +266,000, and the change for March was revised from +126,000 to +85,000. With these revisions, employment gains in February and March combined were 39,000 lower than previously reported."
Generally, in addition to releasing the initial estimate for the previous month, the BLS also revises estimates for the two months preceding that each month, so that every months' estimates are eventually revised three times. How big are these revisions?

I did some data gathering through ALFRED. I assembled a dataset with both the first estimates and final revisions for all the months, and compared the two:
Histogram of the amount by which the BLS initially underestimated employment growth.
The median error was -6,500 workers, with an interquartile range of -157,500 to 171,000 workers. Note this is the final revision minus the initial estimate, so this means that the BLS is slightly optimistic about US employment growth on average, overestimating it 52 percent of the time.

The most relevant question, however, is whether the initial estimate is a helpful policy guide in real time. To test that, I compared the BLS's initial estimates to an automated ARIMA model optimized based on the Bayesian information criteria that uses historical data that would have been available at the time to forecast three months ahead. Thus the forecast model uses only final revisions available at the time to estimate the each month's jobs figure. The resulting median error was -10,730 with an interquartile range of -147,300 to 126,300.
Amount by which automated ARIMA forecast underestimates employment.
As with the BLS forecast, the automated ARIMA forecast slightly overestimates employment growth most of the time, with forecast exceeding the final numbers about 52 percent of the time.

These distributions are basically identical. Sure, you can spot slight differences in a couple of the moments, and sure the BLS data is ever-so-slightly more accurate on average, but by and large these estimates are pretty much identical. So, in the end, the famed jobs day numbers that cause such a stir on twitter the first Friday of every month probably don't tell us anything we didn't already know.

I didn't have time to examine the numbers in more detail. Is there more of a difference between the two methods at critical points in time such as when we are heading into recession? I assign these kinds of questions to the reader as homework. You can get my data here, my R script here, and this is what I used to extract historical initial estimates and match them to eventual revisions (program written in C#, requires the EPPlus library here).