Sunday, August 20, 2017

Climate Normals in Changing Environment

Hi, Rick T. here. One of the things that interests me is how people adjust to a changing climate. Anecdotally, it was vaguely humorous to me last winter to see how quickly many people have incorporated three consecutive mild winters into a perception of a "new normal". This was underlying the many comments I heard about how cold the winter of 2016-17 was in Alaska. Of course, through the multi-decade lens, it wasn't notably cold for the winter (through parts of the state were, by any measure, cold in March). So that got me to thinking: given that many climate variables in Alaska are changing, how can we provide estimates of "normal" and associated variability that takes into account the ongoing changes?

One approach I've been toying with to make these kinds of estimates is with the use of quantile regression. Quantile regression is something of cousin to the more familiar least-squares regression, but is computationally more tedious, so was not much utilized until the advent of modern computing. Nowadays, it's trivially simple to use on the kinds of climate datasets that I mostly work with, that is, point-based time series. So the first question you ask: what is a quantile? A quantile is, to quote Wikipedia, "…cutpoints dividing the range of a probability distribution into contiguous intervals…". Quantiles can have any value between zero and one. So, the 0.5 quantile divides a distribution into two equal sizes: half the values are above and half the values are below. You've heard of this: it's better known as the median. A quantile of 0.843 divides a distribution into two parts: the quantile is the value of the distribution for which 84.3% of the distribution is below and 15.7% above. Quantile regression is a method to estimate the quantile values of a dataset when one variable is (possibly) dependent on one or more other variables. The second question you ask: why would you want to use quantile regression? There are a couple of reasons. First quantile regression is not nearly as sensitive to outliers as ordinary linear regression, which in effect models the mean. Secondly, and most significantly for my purposes here, quantile regression allows us to generate estimates of not only the central values of a distribution, e.g. mean or median, but also allows for estimates of how other aspects of the distribution are (possibly) changing.

As an example of this approach, below is a plot of some climate data that you are probably familiar with: spring breakup dates of Tanana River at Nenana (for this version I've used  "fractional dates" which incorporate the time of breakup, which does not matter to this analysis). There is no statistically significant trend through into the 1960s, so I construct the quantile regression to have zero slope in this time period. The purple line is the segmented median (0.50 quantile) date of breakup, which in this case we're looking at the dependence of breakup date on the year (i.e. the trend). The green-shaded area represents the area between the 0.333 and 0.666 quantiles. So, this plot should partition the breakup dates into three (roughly) equal categories: one-third below the green shading (significantly early break-ups), one-third inside the green shading (near normal) and one-third above (significantly later than normal). From this, it's easy to see that break-up dates during the first days in May in the mid-20th century were solidly in the "significantly earlier than normal" category, but the same dates are now in the "significantly later than normal" category.
Below is another example. Here I've plotted the Alaska-wide January through March average temperature from the NCEI Climate Divisions data set. In this case there is no strong evidence for a change in the regression slope that would be better fit with a segmented analysis. In this plot, the purple line is again the regression of the median (0.50 quantile), but the shaded area in this case represents one standard deviation (if the season average temperatures are normally distributed) either side of the mean (approximated by the 0.159 and 0.841 quantiles). You'll notice that the median and +1 standard deviation estimates have increased more than 3°F since 1925. However, the -1 standard deviation estimate has not changed at all. This suggests that late winter temperatures have become more variable: "cold" late winters are about as cold as they were 90 plus years ago, but the warmest late winters are now significantly warmer than back in the Roaring Twenties. How can that be?

Well, in part it's a feature of my analysis. The estimated slope of the 0.159 quantile (the bottom of the shaded area) is about the same as the median. However, at the 90% confidence level, the 0.159 quantile estimate crosses zero (for all you P-value fans, in this case this is the same as saying there is insufficient support to reject the null hypothesis of "no trend"). The 90% confidence estimate does not cross zero for the median or the 0.841 quantile. My convention is: if there is not robust statistical support for a non-zero trend, plot it as zero. More important than any convention, is there something interesting going on physically? I would suggest that yes there is. The late winter season has seen no long term change in the larger regional scale cryosphere variables, i.e. late winter sea ice extent in the Bering Sea shows lots of inter-annual variability, but no trend; snow cover extent is near the seasonal maximum with no trend at high latitudes. This means that given the appropriate weather pattern it can still be cold. Since cyrosphere changes are evidently not at play, ocean temperatures and increasing greenhouse gas forcing are the obvious suspects that would support increased warmth but at this point still allow the cold "tail" to hang on.

The quantile regression I've presented here allows us to make reasonable estimates of the current  distribution of some climate variables in the face of change. This simple linear approach is not likely to be sufficient in the future. For instance, in looking at the Tanana at Nenana breakup dates, I suspect that we are starting to (or will be soon) butt up against astronomical constraints on how early breakup can be given expected terrestrial climate forcing in the next century; e.g. a solar noon sun angle of 30ยบ above the horizon (Nenana on April 1) can only do so much heating. In that scenario, well need to employ non-linear techniques. But that's a topic for another day.

Updated to respond to Richard's comments and questions of Aug 21.
Here's a plot of the quantile regresion slope at 0.05 increments and the associated confidence intervals (90% level) for the Alaska statewide late winter (JFM) temperatures (data plotted above). In this case both the tails show higher spread in the confidence intervals than most of the middle, which I would expect. One wonders though what's going on at the 0.60 and 0.65 quantiles.
Here is some data with more a problematic structure. This is over a century of first autumn freeze dates at the Experiment Farm at UAF. I've included the segmented median and the "near normal" category (0.333 to 0.666 quantiles):
Here the "problem" is the cluster of very late dates between 2001 to 2011. Below, the quantile regression slope and confidence levels seem reasonable until the very high end. Notice the spread of the 0.95 is lower than others above the 75th percentile. I don't think this is realistic, and must be due to that cluster of very late (top ten) dates.
If we push it out even further and make it even more fine grained (quantiles 0.02 to 0.98 every 0.01)  more artifacts emerge, such as the occasional spikes in the bounds, and then the impossibly small confidence interval above the 95th percentile. For me the moral of this story is that it's important to do this exploratory review first, especially if the focus is in the far extremes of the distributions, where potentially other tools are better suited.   


  1. Very interesting, Rick. I've used quantile regression in a few contexts but this discussion is helpful.

    I'm curious as to how quickly the confidence intervals widen as you run the regression farther out in the tails; for example, presumably tercile regression estimates are quite stable but decile estimates much less so (e.g. if you remove one or more outliers). I wonder if any general statements can be made about this for a typical ~100 year climate record.

    Have you tried this approach on any highly non-Gaussian data?

    1. Richard, I've added a section with some more info on the quantile regression and the confidence intervals. Based on my limited experience, analyses of the tails are strongly dependent on the detailed structure of the distribution.

    2. Rick, the additional plots are interesting and revealing - thanks. It's curious that both of these examples have higher uncertainty on the low end than on the high end, I don't suppose that can be generally true.

      I imagine that one would need to see a "significant" slope in these quantile-dependent regression slopes to be able to claim that asymmetric changes have occurred in the opposite tails. In the JFM example, while the confidence interval of the lower quantiles crosses zero, it is also wider than at higher quantiles, so the slope could also plausibly be greater than the median, no?

    3. Richard, I believe your quite right. More basically, I should have tested for changing variance (e.g. F-test on independent subsets of the data or a Breusch-Pagan test of the full data). Either way, no evidence to support the notion of changing variance. Thanks for the corrective to my thinking.

  2. "This was underlying the many comments I heard about how cold the winter of 2016-17 was in Alaska. Of course, through the multi-decade lens, it wasn't notably cold for the winter (through parts of the state were, by any measure, cold in March). So that got me to thinking: given that many climate variables in Alaska are changing, how can we provide estimates of "normal" and associated variability that takes into account the ongoing changes?"

    I suggest a new measure that devolves from statistics...BTU's of fuel consumed to heat a dwelling per winter by location. Ours was up in the valley floor. Perhaps at elevation above the valley inversion it was not?

    Stat's are fun but $ measures.


    1. You bet Gary, dollars are where the rubber meets the road, so to speak. BTUs of fuel consumed is strongly dependent on all kinds of things beside temperatures: how well insulated is a building. What is the target temperature for indoor heat. How often and for how long are doors kept open? Degree days are one common way to capture the outside temperature part. Otherwise, your mileage may vary.

    2. Thanks for this analysis Rick. Good stuff.

      I'll dig out our heating bills for the last few years. Same house, same Toyo stoves, same internal temps. I'm just curious what the changes were in annual (May-May) gallons consumed. But yes the degree days are best.

      I suspect it wasn't from deep cold as much as prolonged at moderately low temps last winter, but...?


    3. Winter 2016-17 was certainly colder than the previous three in Fairbanks-land (yeah). But not notably cold in a longer (multi-decade) view. March though was definitely one to write home about.