As a follow-up to Richard's excellent post from last week, I decided to re-calculate the daily anomalies based upon the "new" NCDC formula that he and I discussed via email for generating daily standard deviations. The following charts for Fairbanks and Anchorage show the results of that reanalysis.
Here is how to interpret the charts - from bottom to top.
1) The red line shows the annual percentage of days that the temperature was more than three standard deviations either above or below the mean; e.g., +3.2, -3.5, etc. Since it uses percentages, 2013 can be compared to other years on equal footing. Also note that the categories are non-overlapping.
2) The dark blue line shows the percentage of days where the daily temperature anomaly was between 2 and 3 standard deviations either above or below the mean; e.g., +2.1, +2.7, -2.4, etc. The dashed line is the expected value based upon the normal distribution.
3) The solid burgundy line shows the percentage of days where the daily temperature anomaly was between 1 and 2 standard deviations either above or below the mean; e.g., +1.5, +1.2, -1.6, etc. The dashed line is the expected value based upon the normal distribution.
4) The solid green line shows the percentage of days where the daily temperature anomaly was between 0 and 1 standard deviations either above or below the mean; e.g., +0.2, -0.9, -0.7, etc. The dashed line is the expected value based upon the normal distribution.
5) The solid orange line at the top shows the Chi-Squared goodness of fit value. It is a squared, weighted measure of the difference between the actual and expected values and is the gold-standard for categorized (grouped) data. A value of zero indicates that the distribution of anomalies exactly fit a normal distribution. Any Chi-Squared value less than 6.0 (with 2 degrees of freedom) indicates that that year's temperatures approximated a normal distribution at he 95th percent significance level. The larger the value, the more anomalous (less normal) the distribution is. It can also be thought of as a measure of distribution extremes. By this metric, Fairbanks in 2013 (through August 7th) has had the most extreme temperatures of any year on record (post-1930) by a large margin. In fact, if the rest of 2013 is exactly normally distributed, it will still be the most extreme year on record. For Anchorage, 2013 is also in 1st place by a wide margin.
The graphic below shows the calculation for Fairbanks in 2013.
It is worth noting that if additional categories are used that to distinguish positive and negative anomalies, 2013 is still easily in first place overall for Fairbanks but not by as much. If the rest of the year was "normal," 2013 would only rand in 10th place. 1993 is a good year to look at. That year 11% of days were between -1 and -2 SD and 17% of days were between +1 and +2 SD. When you look at those as separate categories, they each differ fairly significantly from the expected value of 13.6. However, when they are combined, the number matches almost perfectly with the expected value. Of course, that knife cuts both ways.
Here is a good explanation of Chi-Squared: http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm
Here is a good site to manually enter numbers and check the Chi-Squared value: http://www.quantpsy.org/chisq/chisq.htm
Brian,
ReplyDeleteVery nice charts. Both cities have seen nearly 50 percent of days outside the +/1 SD range!
I'm a little confused about the Chi-squared statistic, because I have always thought it should be normalized by the expected value in each category. Imagine that the "over 3SD" frequency was exactly normal (i.e. move the surplus days into the other categories), then the Chi-squared value ought to drop considerably, but your calculation would show rather little change. Perhaps you can direct me to a reference for the type of statistical test you are doing.
Actually, I did make a mistake in the Chi-squared calculation and will update the graphic during lunch. Darn!
ReplyDeleteAs for the methodology, the test is suposed to measure how far away you are from the expected values so the data are not supposed to be normalized. The goal of the test is to see IF the data are normal.
Brian,
ReplyDeleteThanks for the links and the update. The new calculation matches what I expected, with by far the highest contribution to the score coming from the higher anomaly categories.
Very nice work. It would be kind of fun to run through the GHCN and see what the highest annual chi-square value ever recorded is, though admittedly that would be a pretty arcane statistic!
Thanks Richard. Unfortunately the number of steps needed to do the analysis is far greater than my ability to automate the process. I would like to replicate this for a more maritime climate like Juneau and an Arctic climate like Barrow as time permits. There seems to be a lot of research on tracking extremes (records) but not so much on annual or seasonal variability. Of course I only spent a few minutes looking it up. Maybe we should come up with a good index of variability and write a paper about it for Alaska.
ReplyDelete