NavList:
A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding
Re: Rejecting outliers
From: George Huxtable
Date: 2011 Jan 8, 14:10 -0000
From: George Huxtable
Date: 2011 Jan 8, 14:10 -0000
I agree with (almost) every word from Gary LaPook in his thoughtful posting. He sets two standard deviations from the mean as the maximum acceptable before excluding an observation as an outlier in a set of five or six. Personally, I would set that threshold somewhat higher, to say 2.5 SD. And his data-set is so near to a straight-line that graphing a straight-line, then measuring offsets with dividers, though fine in principle, is somewhat crude, and I would prefer to calculate those offsets numerically. But really, there's little to separate us on those matters. But I think that there's a serious weakness in his adoption of a personal, historical, value for standard deviation of his measured altitudes of 1.433', and using twice that value to set an acceptability threshold for a particlar day. That may indeed have been the standard devition of his altitude observations from GPS values over a long term, but those will not all have been measured under the same circumstances. Some may have been taken from smaller vessels in fair weather, others in rough conditions. Some may even have been taken on land. It may be fair to lump together the rough with the smooth, and arrive at a personal average, but then what needs to be considered are the circumstances of the moment, and how they differ from that average. So if most had been taken from the deck of a craft of a few (or a few tens of) tons, how should that apply to observations from a vessel of many thousands of tons, with or without sail propulsion? If some of them had been taken in stormy conditions, even aboard Royal Clipper, much greater scatter would be expected, and no doubt many, perhaps most, of the resulting observations would fail Gary's test and be excluded, even though they were not really outliers from the expected (wider)distribution. To which, Gary may well ask what value he should use instead. And I suggest his best measure of scatter is the standard deviation of his observations of that day, about their mean value, having corrected them for time-dependence. The trouble is that to get a decent figure for that scatter calls for a reasonable number of observations, to avoid the result being overdependent on chance. Six observations isn't really enough. Nine, as in Peter Fogg's example, is still somewhat short. The more, the better. But Gary's personal experience of his ability to measure altitudes, from the past, can not be dismissed, but used to inform the choice of a suitable value for the scatter pertaining to that day's conditions. There may, indeed, still be a bit of 'magic' left in the navigator's art, in the making of such decisions. George. contact George Huxtable, at george{at}hux.me.uk or at +44 1865 820222 (from UK, 01865 820222) or at 1 Sandy Lane, Southmoor, Abingdon, Oxon OX13 5HX, UK.