One approach I've been toying with to make these kinds of estimates is with the use of quantile regression. Quantile regression is something of cousin to the more familiar least-squares regression, but is computationally more tedious, so was not much utilized until the advent of modern computing. Nowadays, it's trivially simple to use on the kinds of climate datasets that I mostly work with, that is, point-based time series. So the first question you ask: what is a quantile? A quantile is, to quote Wikipedia, "…cutpoints dividing the range of a probability distribution into contiguous intervals…". Quantiles can have any value between zero and one. So, the 0.5 quantile divides a distribution into two equal sizes: half the values are above and half the values are below. You've heard of this: it's better known as the median. A quantile of 0.843 divides a distribution into two parts: the quantile is the value of the distribution for which 84.3% of the distribution is below and 15.7% above. Quantile regression is a method to estimate the quantile values of a dataset when one variable is (possibly) dependent on one or more other variables. The second question you ask: why would you want to use quantile regression? There are a couple of reasons. First quantile regression is not nearly as sensitive to outliers as ordinary linear regression, which in effect models the mean. Secondly, and most significantly for my purposes here, quantile regression allows us to generate estimates of not only the central values of a distribution, e.g. mean or median, but also allows for estimates of how other aspects of the distribution are (possibly) changing.
As an example of this approach, below is a plot of some climate data that you are probably familiar with: spring breakup dates of Tanana River at Nenana (for this version I've used "fractional dates" which incorporate the time of breakup, which does not matter to this analysis). There is no statistically significant trend through into the 1960s, so I construct the quantile regression to have zero slope in this time period. The purple line is the segmented median (0.50 quantile) date of breakup, which in this case we're looking at the dependence of breakup date on the year (i.e. the trend). The green-shaded area represents the area between the 0.333 and 0.666 quantiles. So, this plot should partition the breakup dates into three (roughly) equal categories: one-third below the green shading (significantly early break-ups), one-third inside the green shading (near normal) and one-third above (significantly later than normal). From this, it's easy to see that break-up dates during the first days in May in the mid-20th century were solidly in the "significantly earlier than normal" category, but the same dates are now in the "significantly later than normal" category.
The quantile regression I've presented here allows us to make reasonable estimates of the current distribution of some climate variables in the face of change. This simple linear approach is not likely to be sufficient in the future. For instance, in looking at the Tanana at Nenana breakup dates, I suspect that we are starting to (or will be soon) butt up against astronomical constraints on how early breakup can be given expected terrestrial climate forcing in the next century; e.g. a solar noon sun angle of 30º above the horizon (Nenana on April 1) can only do so much heating. In that scenario, well need to employ non-linear techniques. But that's a topic for another day.