The chart below shows the history of June 10 forecasts, along with the observed September ice extent according to the National Snow and Ice Data Center. Ice extent was defined as the area covered by at least 15% concentration of sea ice in the model, which matches the NSIDC definition. Interestingly the model forecasts have a low bias over the entire history - see the bottom of this post for a discussion of this. However, in recent years the forecasts have failed to capture the extent of the melt-out, i.e. the June forecasts have predicted much less change over time than has actually happened.
If we compare the forecast and observed ice extent to the 1982-2010 mean of each series, then we have a bias-corrected comparison, see below. The model failure in recent years really stands out, but it is also clear that there is some skill in predicting the year-to-year variability. This year's forecast is also really dramatic, because the model is predicting the highest extent since 1992 when compared to itself. It seems highly unlikely that anything of this magnitude will actually happen, but it also seems likely that the model is capturing some kind of signal. Based on my experience in seasonal forecasting, it is usually worth paying attention when the models show large anomalies, although usually the timing, magnitude, or location of the predicted anomaly is not quite right.
In regard to the low bias in the model forecasts, at first glance this appears to be the opposite of the bias claimed by the model developers in their published article (as helpfully pointed out by Brian): "For the sea ice prediction, sea ice appears too thick and certainly too extensive in the spring and summer... The model shows a consistent high bias in its forecasts of September ice extent." The figure below is taken from the article and shows (in the lower left panel) that the June 15 forecasts produce ice concentration that is too high compared to the observations. However, ice concentration is not the same as ice extent, and it seems possible that the model could be producing ice that is too densely concentrated but yet the 15%-area is too small. I haven't yet obtained a history of observed sea ice concentration to be able to test this idea.