Hi, Rick T. here. One of the things that interests me is how people adjust to a changing climate. Anecdotally, it was vaguely humorous to me last winter to see how quickly many people have incorporated three consecutive mild winters into a perception of a "new normal". This was underlying the many comments I heard about how cold the winter of 2016-17 was in Alaska. Of course, through the multi-decade lens, it wasn't notably cold for the winter (through parts of the state were, by any measure, cold in March). So that got me to thinking: given that many climate variables in Alaska are changing, how can we provide estimates of "normal" and associated variability that takes into account the ongoing changes?
One approach I've been toying with to make these kinds of estimates is with the use of
quantile regression. Quantile regression is something of cousin to the more familiar least-squares regression, but is computationally more tedious, so was not much utilized until the advent of modern computing. Nowadays, it's trivially simple to use on the kinds of climate datasets that I mostly work with, that is, point-based time series. So the first question you ask: what is a quantile? A quantile is, to quote
Wikipedia, "…cutpoints dividing the range of a probability distribution into contiguous intervals…". Quantiles can have any value between zero and one. So, the 0.5 quantile divides a distribution into two equal sizes: half the values are above and half the values are below. You've heard of this: it's better known as the median. A quantile of 0.843 divides a distribution into two parts: the quantile is the value of the distribution for which 84.3% of the distribution is below and 15.7% above. Quantile regression is a method to estimate the quantile values of a dataset when one variable is (possibly) dependent on one or more other variables. The second question you ask: why would you want to use quantile regression? There are a couple of reasons. First quantile regression is not nearly as sensitive to outliers as ordinary linear regression, which in effect models the mean. Secondly, and most significantly for my purposes here, quantile regression allows us to generate estimates of not only the central values of a distribution, e.g. mean or median, but also allows for estimates of how other aspects of the distribution are (possibly) changing.
As an example of this approach, below is a plot of some climate data that you are probably familiar with: spring breakup dates of Tanana River at Nenana (for this version I've used "fractional dates" which incorporate the time of breakup, which does not matter to this analysis). There is no statistically significant trend through into the 1960s, so I
construct the quantile regression to have zero slope in this time
period. The purple line is the segmented median (0.50 quantile) date of breakup, which in this case we're looking at the dependence of breakup date on the year (i.e. the trend). The green-shaded area represents the area between the 0.333 and 0.666 quantiles. So, this plot should partition the breakup dates into three (roughly) equal categories: one-third below the green shading (significantly early break-ups), one-third inside the green shading (near normal) and one-third above (significantly later than normal). From this, it's easy to see that break-up dates during the first days in May in the mid-20th century were solidly in the "significantly earlier than normal" category, but the same dates are now in the "significantly later than normal" category.
Below is another example. Here I've plotted the Alaska-wide January through March average temperature from the
NCEI Climate Divisions data set. In this case there is no strong evidence for a change in the regression slope that would be better fit with a segmented analysis. In this plot, the purple line is again the regression of the median (0.50 quantile), but the shaded area in this case represents one standard deviation (if the season average temperatures are normally distributed) either side of the mean (approximated by the 0.159 and 0.841 quantiles). You'll notice that the median and +1 standard deviation estimates have increased more than 3°F since 1925. However, the -1 standard deviation estimate has not changed at all. This suggests that late winter
temperatures have become more variable: "cold" late winters are about as
cold as they were 90 plus years ago, but the warmest late winters are now
significantly warmer than back in the Roaring Twenties. How can that be?
Well, in part it's a feature of my analysis. The estimated slope of the
0.159 quantile (the bottom of the shaded area) is about the same as the median. However, at the 90%
confidence level, the 0.159 quantile estimate crosses zero (for all you P-value fans, in
this case this is the same as saying there is insufficient support to
reject the null hypothesis of "no trend"). The 90% confidence estimate does not cross zero for the median or the 0.841 quantile. My convention is: if there
is not robust statistical support for a non-zero trend, plot it as zero. More important than any convention, is there something interesting
going on physically? I would suggest that yes there is. The late winter
season has seen no long term change in the larger regional scale
cryosphere variables, i.e. late winter sea ice extent in the Bering Sea
shows lots of inter-annual variability, but no trend; snow cover extent is near the seasonal
maximum with no trend at high latitudes. This means that given the appropriate weather
pattern it can still be cold. Since cyrosphere changes are evidently not at play, ocean temperatures and
increasing greenhouse gas forcing are the obvious suspects that would support increased warmth but at this point still allow the cold "tail" to hang on.
The quantile regression I've presented here allows us to make reasonable estimates of the current distribution of some climate variables in the face of change. This simple linear approach is not likely to be sufficient in the future. For instance, in looking at the Tanana at Nenana breakup dates, I suspect that we are starting to (or will be soon) butt up against astronomical constraints on how early breakup can be given expected terrestrial climate forcing in the next century; e.g. a solar noon sun angle of 30ยบ above the horizon (Nenana on April 1) can only do so much heating. In that scenario, well need to employ non-linear techniques. But that's a topic for another day.
___________________________
Updated to respond to Richard's comments and questions of Aug 21.
Here's a plot of the quantile regresion slope at 0.05 increments and the associated confidence intervals (90% level) for the Alaska statewide late winter (JFM) temperatures (data plotted above). In this case both the tails show higher spread in the confidence intervals than most of the middle, which I would expect. One wonders though what's going on at the 0.60 and 0.65 quantiles.
Here is some data with more a problematic structure. This is over a century of first autumn freeze dates at the Experiment Farm at UAF. I've included the segmented median and the "near normal" category (0.333 to 0.666 quantiles):
Here the "problem" is the cluster of very late dates between 2001 to 2011. Below, the quantile regression slope and confidence levels seem reasonable until the very high end. Notice the spread of the 0.95 is lower than others above the 75th percentile. I don't think this is realistic, and must be due to that cluster of very late (top ten) dates.
If we push it out even further and make it even more fine grained (quantiles 0.02 to 0.98 every 0.01) more artifacts emerge, such as the occasional spikes in the bounds, and then the impossibly small confidence interval above the 95th percentile. For me the moral of this story is that it's important to do this exploratory review first, especially if the focus is in the far extremes of the distributions, where potentially other tools are better suited.