Monday, October 20, 2014

Fairbanks Forecast Performance

For some time I've been meaning to take a look at the long-term performance of the National Weather Service temperature forecasts for Fairbanks, and particularly with one question in mind: do the forecasts show enough variance at the end of the short-term forecast period, i.e. 5-7 days in the future?

The question is motivated by the idea that sometimes the computer models indicate a pronounced temperature anomaly from about a week in advance, but the early NWS forecasts for the same time show only a small departure from normal.  A recent example was seen in the early October cold spell, when the ECMWF and GFS deterministic forecasts of September 29 both showed a notable cold anomaly in place by October 5, but the NWS forecast for the high temperature on October 5 was 38 °F, only 3.6 °F below normal.  In this case, as time went on and the forecast became more certain, the forecast dropped and the observed high temperature was 31 °F.  However, there are many cases when the computer forecasts are badly wrong from 7 days out, and so it is entirely justifiable for the official forecast to show only a small anomaly at longer lead times.  Indeed, it would be most undesirable for the raw model forecast to be reflected in the official outlook, because the numbers would often swing wildly from day to day.  The question is, does the NWS have the right balance?

It's possible to answer this question using a history of NWS forecasts that I have collected for Fairbanks airport since November 2011.  First, here is the basic "skill" of the forecasts for lead times of 1-6 days, i.e. the forecasts for "tomorrow" through "6 days from now".  Averaged over all seasons, the average error of the high and low temperature forecasts is similar and rises from just over 4 °F to nearly 8 °F over the six days.  Not surprisingly, the errors are much larger in winter, but it is interesting to see that the winter low temperature forecasts improve more significantly at shorter lead times, whereas the winter high temperature forecast error remains over 7 °F even for "tomorrow".


Here's a similarly-formatted chart showing the bias of the forecasts, i.e. the mean difference between the forecast and the observed temperatures.  Negative values indicate that the forecasts were too cold on average.  We see that the winter high temperature forecasts have been several degrees too cold on average in the past 3 years, even at shorter lead times, but the bias is much smaller for the low temperatures.  It would be interesting to investigate this further in search of a possible explanation.



Let's now consider the scaling of the temperature forecasts.  I've examined this by calculating the mean absolute error (MAE) that would result if the NWS forecast anomaly (departure from normal) were multiplied by values ranging from 0 to 2.  On the low end of this range, the forecasts would deviate very little from climatology and the forecast would just show normal values each day; but on the high end, the forecasts would show greater deviations from normal than they currently do.  The chart below shows the results of this experiment for day 7 temperature forecasts from all seasons of the year.


The data from the last 3 years show that (on average through the year) the high temperature forecasts are perfectly scaled at day 7, i.e. there is no way to improve the MAE by arbitrarily reducing or increasing the forecast anomaly.  We conclude that the NWS shows just the right amount of variance on average in the day 7 high temperature forecasts; this is not to say that we can't improve on any given forecast using additional information, but we can't reduce the error by simply adjusting the departure from normal across the board.

The day 7 low temperature forecasts are not quite optimally scaled, according to these results, as the NWS shows marginally too much variance.  In other words, the forecasts would be marginally (but only very slightly) better if they showed smaller departures from normal.

There is one other aspect of the problem that interests me, and that is whether we can show that the forecast variance is too small when the computer models show a large anomaly (as opposed to any size anomaly) and/or when the computer models agree with each other.  I'll return to this idea in a subsequent post.

11 comments:

  1. I've also wondered at how well the forecasts are compared to actual values. The temps running higher than forecasted is interesting since you would have expected them to be around zero in variance because of noise. But there does appear to be a bias.

    How are NWS forecasts created? What models are used and how much human intuition is at play? How much are the current conditions used to seed the future?

    Could you also clarify what the minimized day 7 variance means? I can't quite get what you are looking at.

    ReplyDelete
    Replies
    1. Eric, the winter bias is interesting and I'll have to look at it in more detail - e.g. does it arise from infrequent bad forecasts, or is it a day-in, day-out occurrence.

      I can't speak to the exact details of the NWS forecast process, but I believe it is fairly automated now, so that the statistically post-processed computer model output provides the first guess as a gridded field. I believe the human forecaster then has a chance to tweak and adjust the grids based on the latest observations, experience, and so on. The GFS and NAM models are heavily used in general by the NWS, but lots of other tools and models are available.

      Concerning the variance: I attempted to minimize the MAE by adjusting the magnitude of the forecast anomaly. For example, if the NWS predicts 30F compared to normal of 35F, this is an anomaly of -5F. My experiment tried to see if the overall error was reduced if the anomaly was reduced to e.g. -3F or increased to e.g. -7F. If the anomaly was reduced to zero, then the forecast would always call for exactly normal and the error would be the same as that of climatology.

      Delete
  2. Great analysis Richard. I have run a similar analysis in the past and found a few interesting things. First, the forecasted temperature trends toward climatology the farther out you get. A forecaster is probably more comfortable going 15° above normal on Day 1 than he/she is on Day 7.

    Also, the skill deteriorates every day out from Day 1. For example, on Day 1 in Anchorage, a climatology forecast is off by an average of 5.3°F while the NWS forecast was off by 2.0°F. Pretty good. However, for Day 6, the NWS forecast was off by 4.0°F. Only a modest increase over the 5.3° climatology forecast.

    Finally, a persistence forecast always beat a climatology forecast and the NWS forecast always beats persistence.

    ReplyDelete
    Replies
    1. Brian, it's good to know that you find the same things. These trends will be universal in a properly constructed forecasting system. The farther out in time you go, the closer you should hedge to climatology, because the skill is smaller; this is the way to minimize the MAE. If there's no skill, you should just forecast "normal" every day.

      I didn't look at persistence but it would be an interesting exercise to see how good it is at various points around Alaska.

      Delete
  3. It is interesting Richard. As a non-forcaster (beyond a colder or warmer trend) I'd ask how the local NWS' predictions compare with popular commercial products given the same scrutiny?

    Gary

    ReplyDelete
    Replies
    1. Come to think of it, I've noticed that commercial products like wunderground always seem off by 5 degrees or more especially in the winter. Taken that they are model runs only this would fit with what is found above. Accuweather once called for -55 four days before it was actually -40.

      It also makes me think that the microclimate is why the airport always runs hot; those models only have so much spatial resolution.

      Delete
    2. Good question, Gary. Unfortunately I don't have the data to examine other forecast sources, but I suspect some are quite poor. I doubt any would come close to NWS in terms of skill, because the local insight of the human forecasters will be significant in a data-sparse region like Alaska; and it's unlikely that any national commercial outfit would pay for human scrutiny of the daily forecasts for Alaska.

      Delete
    3. Yes I understand. Plus you have lots to do besides work even more.

      I dug up this link but haven't had the time to read all of it. It applies to a different height than the surface. I'll try to look around more in the next few days for similar products.

      http://www.emc.ncep.noaa.gov/gmb/STATS_vsdb/longterm/

      Gary

      Delete
    4. On p. 53 there's a 2013 T2M discussion for Alaska.

      Gary

      Delete
  4. Good stuff. Maybe it's beyond the scope of this blog, but I'm interested in the auto-generated point forecasts for a specified latitude/longitude. Are these based on PRISM model output? Is there any way to assess their performance?

    ReplyDelete
  5. Thanks for the question. I don't know all the inputs that go into the gridded forecasts, but one important predictor is the gridded MOS that has been developed in recent years. There are some references at the bottom of this page:

    http://www.nws.noaa.gov/mdl/synop/gmos.php

    As far as I'm aware, the PRISM technique is only used for interpolating climate data, i.e. long-term normals, to a high-resolution grid. I've not heard of it being used for making forecasts, although I think some of the lessons learned from PRISM have been used in developing gridded MOS.

    I suppose the gridded forecasts can only be rigorously evaluated at surface observing sites, of which there are relatively few in Alaska. The MOS equations are derived for sites with relatively long-term, stable observation histories, so the performance might be relatively poor at remote locations.

    Rick (Thoman) would have a lot more insight on these matters.

    ReplyDelete