NavList:
A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding
Re: Rejecting outliers: was: Kurtosis.
From: Fred Hebard
Date: 2011 Jan 1, 12:33 -0500
From: Fred Hebard
Date: 2011 Jan 1, 12:33 -0500
All of this discussion could be informed immensely by some data and associated analyses. Data talk. On Dec 31, 2010, at 6:50 PM, Peter Fogg wrote: > Geoffrey Kolbe wrote: > > I have to say that I share George's disquiet about the notion of > rejecting outliers simply because they do not seem to fit with the > other data. > > Perhaps it is that, like George, I have a background as an > experimental physicist, and that the notion of rejecting some data > simply because it does not sit neatly with the rest of the data is > an anathema. > > This puzzles me, Geoffrey, given the boundaries of the particular > context we are discussing. You do understand that the calculated > slope can be assumed to be a fact? (ie; apart from being an > approximation of an arc, and assuming the DR is reasonable). > > Therefore if you end up with a pattern of sights that more or less > follows that slope, but one (or more) apparent outlier that > obviously does not, what possible conclusion can be reached? > Either the pattern is generally correct and the outlier an obvious > indication of error, or the outlier is correct and the apparent > pattern then must be entirely composed of erroneous data sets. It > seems to me to be a common-sense choice between these alternatives. > > There is a third way, that of averaging. This accords weight to > the apparent outlier(s) in proportion to population extent. If > there are 2 outliers, both on the same side of the line, and a > restricted population (which is pretty-much a given) then > significant error can result. Error that is easily avoided by use > of the slope. > > Experimental data is usually messy and experience shows that a lot > can be learned from consideration of the possible causes of outliers. > > I agree. Use of slope allows for and encourages this > consideration. Averaging does not. > > The navigator's time would be better spent taking another round of > sights to force better precision on the mean than applying a > statistical eraser to doubtful data. > > You can't be serious. Firstly; remember that one of the most > significant drawbacks to the use of celestial navigation in > practice is the weather. Another is the limited extent of dawns > and dusks available (only 2 per day!) which offer the great > advantage of a multiple-body fix at much the same time, without > introducing error through running forward or back a position. > > I suggest that the navigator's time would be better spent in > analysing the sights he/she has, and applying this simple technique > in order to reduce random error. > > Even if taking more sights is not practical, outliers should not be > discarded unless a good reason presents itself as to why they > should be discarded. > > Once the slope has been calculated and the pattern of sights > compared with it, it is up to the individual navigator to make the > decision about the best place to place the slope amongst the > sights. As much or as little weight can be accorded to any > apparent outliers as you like. > > Frank seems to think that some pre-determined numerical quantity > can be applied to assist that decision. Goodo. Feel free to do > this if anyone wants to. I doubt very much whether much of this > will ever happen in practice; one of the big advantages of slope is > its simplicity and relative ease of use. > > The other very obvious point is that without slope how would you be > even aware of apparent outliers? Not though blind averaging, > that's for sure. If you only take one sight and then reduce that > then you have no idea of how good or bad that individual sight > might be. Could be excellent. Could be an outlier. Could be > anything at all. > > The consequence may be a rather more open cocked hat or a fix of > somewhat looser precision than one would like. But better that than > discarding "bad data" and risk a false sense of security from the > resulting tight fix. > > Half right. The right part is that one should never assume that > any fix is entirely free of error. However, remember that use of > slope is only one-half of a two-pronged approach; the half of > dealing with random error as best as can be practically done (until > someone comes up with a better method, which is somewhat different > to going to outlandish lengths to try to poke holes in this one - > eg; assuming a vastly wrong DR). > > The other half of the two-pronged approach is to assume the > resulting position lines to be free of random error, leaving non- > random or systematic error to be dealt with. This can be simply > and effectively done by bisecting the angles of intersecting > position lines, leading to a fix position where these bisecting > lines meet at a point. It ain't perfect, but it can be reasonably > expected to be a better fix, with reduced extents of both random > and non-random error. > > Nothing magic after all, George, Geoffrey et al. Sorry. Its > really only common sense powered via some simple drafting. > > I've just given up resisting the temptation to add this (you can > think of this personal weakness as a kind of reverse New Year's > resolution): > > Pourquoi faire simple lorsque, avec tellement peu plus d'effort, > l'on peut faire compliquer... > > > >