Friday, December 26, 2014

Forecast Bias Revisited

It was a week ago now that Fairbanks experienced its first day this winter with an official (midnight-to-midnight) high temperature below zero.  This was 32 days later than the long-term median of November 17 and was also the second latest on record after 2002 (December 21).  The chilly conditions also ended a 31-day sequence of days that saw a higher maximum temperature than indicated by the NWS forecast from the previous morning.  So far in December, the NWS day-1 high temperature forecasts are averaging 7.7 °F too cold at Fairbanks airport.  We previously discussed this puzzling cold bias in the NWS temperature forecasts here and here.

To look at this issue a little more closely, I obtained the archive of MOS (Model Output Statistics) computer forecasts since 2000; these are purely statistical forecasts with no subjective modification by a human forecaster.  The MOS forecasts often serve as a first guess for a human forecast, and are undoubtedly a key component of the NWS forecast process.  As the chart below shows, the NWS day-1 high temperature forecasts follow the MOS numbers quite closely, although the NWS temperatures average 2.7 °F higher than MOS.  Note that the diagonal line shows the 1:1 (perfect match) line, not the best-fit line.

The systematic difference between the two sets of forecasts reveals that the day-1 MOS bias is even greater than the NWS bias, i.e. the NWS forecasters are removing part of the cold bias from the MOS values.  However, there is a nuance here that I realized when reading the MOS documentation.  The MOS high temperature forecasts are valid for the "daytime" period 7am - 7pm local standard time, and this means that it is not appropriate to use the midnight-to-midnight maximum temperatures for verification.  On average (2000-2014), the Fairbanks midnight-to-midnight maximum is 2.6 °F higher than the 7am-7pm maximum in December through February, because the 24-hour period has several hours either side of the "daytime" period in which the temperature can be higher (and this often happens with little solar heating in deep winter).

The bias of the MOS forecasts each winter since 2000-2001 is shown in the chart below, both for the midnight-to-midnight verification and the (correct) 7am-7pm verification.  The red diamonds indicate the mean winter temperature anomaly, which we expect to show some inverse relation to the bias.  We see that the MOS forecasts were actually too warm back in the winters of 2003-4 through 2005-6, but since then the cold bias has been persistent even when using the correct "daytime" verification.

An interesting conclusion we can draw from the recent MOS cold bias is that recent winters have seen a different statistical relationship between the airport temperatures and the GFS model predictors than during the MOS development (fitting) period.  The MOS regression equations for temperature were last updated in March 2010, but this seems to have made things worse as the cold bias was quite notable in the subsequent winter despite colder than normal temperatures.  As it happens, the next MOS update is scheduled to occur on January 14, 2015, in tandem with a major upgrade to the GFS model itself; so it will be very interesting to see if the MOS forecast bias is reduced at that time.  If it is, then it's likely that the NWS forecasts will also see sudden improvement.


  1. If I understand the MOS correctly, it's simply taking the model runs over a period of time and creating a statistical model. Therefore, it really little basis in reality but for the initial weather inputs into the model. So the MOS is dependent on the models in the end, right?

    You also mention the GFS model. When ever I read the forecast discussion they always talk about other models too. Are those included in the MOS?

    The GFS runs usually on a 18 mile resolution. During the winter, you can have a 20 degree change in those 18 miles. And the area just east of the university is consistently 5-10 colder next to the slough as compared to the Airport. So how much stock should be put in the models in general? I'm not saying we shouldn't use them. But that human correction is needed. You just need someone who knows the area well to make that correction.

    1. Eric, MOS is a multiple linear regression between model predictors and the historical station observations (temp, wind, cloud, precip, etc). So the MOS forecasts are "tuned" to the station in question - such as PAFA - and should have no bias at that point, assuming no change in instrumentation or observation characteristics. The underlying model (e.g. GFS) resolution is irrelevant because the equations are predicting for the station itself (that's what's nice about it). That's also what makes it surprising that such a large bias has developed in recent years.

      There is also a gridded MOS product that covers a grid of points, but I've been looking at the traditional MOS that is valid for individual stations.

      There is a separate MOS product created from the NAM model. It would be interesting to examine those forecasts to see if they have the same problem.

  2. A fascinating study Richard! Since the airport temperature is usually a little colder than the rest of Fairbanks, I wonder if the NWS point forecast for Fairbanks International Airport is actually tailored for the main part of the city instead?

    Re: Eric's comment, the GFS is downscaled all the way to 6 km and i wonder if the MOS statistics use the 6 km data (or the even lower NAM HIRES) when running their regressions.

    1. Brian, I believe the NWS forecasts are produced on grids, so the airport forecast is just the value from the grid box in which the airport is located. The website presentation leads me to believe it is a box of about 4x4 miles and excludes downtown; but there would be large (actual) variations even within this area. I suppose that the box value is heavily influenced by the MOS forecast for the airport - or perhaps the gridded MOS is used as the first guess - but I don't know what kind of adjustments are applied.

      Perhaps Rick will chime in on this.

      Presumably the cold bias would be larger if we compared the airport forecasts to the observations at a warmer location in the city.

  3. Not too many years ago I went up to the NWS office for a visit. Just to meet and greet, and get a brief glimpse of what the job involved. Curious is all. They had converted to the multi-zone forecasting and that was a challenge.

    I asked if the forecasters ever looked back to see how close their predictions were to what actually happened. The fair reply was that some part of the agency did some of that, but there was no time for them locally to revisit and examine trends of errors (or perhaps what's called forecast bias here).

    Might still be like that but whatever they offer is appreciated.