First, some background to set the stage.
- Since I am looking at days with zero precipitation, not even a trace, I used data since 1947, since this is entirely within the Weather Bureau/Weather Service era of 24-hour per day observations and there is no missing daily precipitation data.
- The frequency of precipitation varies seasonally, e.g. May in Yakutat averages twice as many days without any precipitation as October, so we need to limit the analysis to this time of year. Therefore I confined the analysis to the early autumn (August through October) season. I'm also assuming there is no trend in dry days streaks (which is the case for the total number of dry days in ASO).
- For statistical analysis, the independence of events is often an important underlying assumption. So while it's easy to generate simple counts of consecutive days without precipitation, it took a bit more work to find the independent streaks. To illustrate this, a simple count revels that there are two streaks of 19 days with zero precipitation during August through October, 1947 to 2018. However, both of these streaks are simply subsets of the 20-day streak from this past September (Sep 2-20 and Sep 3-21). So removing all the streaks that are simply subsets of longer ones, here's what we find for the counts of independent, non-overlapping streaks of specific lengths:
There are a number of ways to potentially answer such questions, but the one I'll provide here involves our old friend, regression. But rather than linear regression (which obviously is not appropriate), I tried mathematical forms that have the potential to represent what we see in the plot above: large and rapid changes as we move from left to right along the x (horizontal) axis. Two commonly used forms for distributions of this shape are exponential and power law. In order to facilitate this analysis I first converted the raw count values into frequencies per year and then plotted the frequency on a log scale, which results in this:
This is the same information as in first figure, just displayed in a different way. But it allows us to immediately to see that that an exponential fit is not likely to work so well. How do we know that? Well, with the y (vertical) axis plotted in log scale, an exponential fit will appear as a straight line. Just eyeballing the top of the bars, you can see that a straight line will fit pretty well for streaks of 11 days or less, but then fails to capture the handful of events longer duration. For that, a power law fit works out better. Now a well established issue with power law fits is that the often only part of a distribution (typically the right tail) is well represented by a power law. How does that work out in this case? I systematically fitted a power law using the observed frequency of all the streak lengths, i.e. 1 day to 25 days (everything about 20 is zero). Then I fitted a power law to streaks of two days or longer, then three days or longer, etc. The "winner" was the fit that had the lowest root mean squared error but still utilized most of the data (there are more sophisticated ways to do this but I have not had the time to implement them, though in this case will lead to the same answer). It turns out that the best fit was for runs to two or more days and it looks like this:
So based on this analysis, the streak of 20 completely dry days in row has only about a 0.7% chance of occurring in any particular August through October season. I've noted the return period as calculated from the fit on the graphic for selected streaks, though I don't really like to do that because it's easy to misinterpret. Why do it? People like to see it, and in principle it is perhaps a more intuitive way to express low likelihood events. But, it is important to remember that a long return period is the inverse of a very small number, and so small changes in the fit result in big changes in the return period. So if I improve this analysis and come up with probability for a 20-day streak as 0.9% in any given year, that's a small change from 0.7%, but the return period would drop by 40 years, to 111.
So from this analysis, the 20 dry days in a row at Yakutat this September was likely a once in a lifetime event, at least if you're of mature years. After all, 0.7% annual chance of occurrence means that, assuming no change and no year-to-year correlation, that there is about a 30% that this will happen at least once in the next 50 years.
Maybe Markov Chain analysis would help here?
ReplyDelete-Bill
Thanks for this idea Bill. I'll think about that.
ReplyDelete