NavList:
A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding
Re: Rejecting outliers: was: Kurtosis.
From: Geoffrey Kolbe
Date: 2011 Jan 01, 09:41 +0000
From: Geoffrey Kolbe
Date: 2011 Jan 01, 09:41 +0000
Peter Fogg wrote:-
My quibble with using a calculated slope to fit to the data is a small one Peter, but not an important one from a practical point of view. However, that is not the issue here.
What I do have a major problem with is your readiness to reject certain data points because they "obviously" do not fit neatly on or near the line with the rest of the data. For you it seems, keeping those data points would quite ruin the whole look of the thing. So your solution? Rub them out. Problem solved.
I put the word "obviously" in quotes because it is your choice of word and - as George keeps complaining - you have yet to qualify how you make that decision. It is "obvious" to you, but it is not to me or to George - and we are not dummies. How far away does the data point have to be before it is "obviously" wrong? And if it is so far out, the important question is - why is it so far out that it sticks out so much?
The issue here (once again) is not the use of a calculated slope to fit to the data and so recognize outliers. The issue - look at the subject heading - is what to do with the outliers.
You seem far too ready to follow the line of reasoning that a separated data point -> error -> rejection of data point, without attempting to identify what that error might be. If you can identify what the "error" was, fine, reject the data point. But if you cannot identify the mistake or problem that gave rise to that separated data point, then you have no justification to reject it.
I said
And you replied,
Which is why I went on to say...
"Even if taking more sights is not practical, outliers should not be discarded unless a good reason presents itself as to why they should be discarded. The consequence may be a rather more open cocked hat or a fix of somewhat looser precision than one would like. But better that than discarding "bad data" and risk a false sense of security from the resulting tight fix."
To which you responded,
Well, no Peter. You just can't deal with outlier data points "as you like" on a whim. If the data as a whole are to have value, it must be treated systematically. That is what you seem to fail to grasp.
And you concluded....
Oh dear. Why this passion for writing in French on an English language forum Peter? What are you trying to prove? Take your own advice - writing it in English would have taken up less space and would mean we can all benefit from your erudite wisdom....
Happy New Year
Geoffrey Kolbe
You do understand that the calculated slope can be assumed to be a fact? (ie; apart from being an approximation of an arc, and assuming the DR is reasonable).
Therefore if you end up with a pattern of sights that more or less follows that slope, but one (or more) apparent outlier that obviously does not, what possible conclusion can be reached? Either the pattern is generally correct and the outlier an obvious indication of error, or the outlier is correct and the apparent pattern then must be entirely composed of erroneous data sets. It seems to me to be a common-sense choice between these alternatives.
My quibble with using a calculated slope to fit to the data is a small one Peter, but not an important one from a practical point of view. However, that is not the issue here.
What I do have a major problem with is your readiness to reject certain data points because they "obviously" do not fit neatly on or near the line with the rest of the data. For you it seems, keeping those data points would quite ruin the whole look of the thing. So your solution? Rub them out. Problem solved.
I put the word "obviously" in quotes because it is your choice of word and - as George keeps complaining - you have yet to qualify how you make that decision. It is "obvious" to you, but it is not to me or to George - and we are not dummies. How far away does the data point have to be before it is "obviously" wrong? And if it is so far out, the important question is - why is it so far out that it sticks out so much?
There is a third way, that of averaging. This accords weight to the apparent outlier(s) in proportion to population extent. If there are 2 outliers, both on the same side of the line, and a restricted population (which is pretty-much a given) then significant error can result. Error that is easily avoided by use of the slope.
The issue here (once again) is not the use of a calculated slope to fit to the data and so recognize outliers. The issue - look at the subject heading - is what to do with the outliers.
You seem far too ready to follow the line of reasoning that a separated data point -> error -> rejection of data point, without attempting to identify what that error might be. If you can identify what the "error" was, fine, reject the data point. But if you cannot identify the mistake or problem that gave rise to that separated data point, then you have no justification to reject it.
I said
- The navigator's time would be better spent taking another round of sights to force better precision on the mean than applying a statistical eraser to doubtful data.
And you replied,
You can't be serious. Firstly; remember that one of the most significant drawbacks to the use of celestial navigation in practice is the weather. Another is the limited extent of dawns and dusks available (only 2 per day!) which offer the great advantage of a multiple-body fix at much the same time, without introducing error through running forward or back a position.
Which is why I went on to say...
"Even if taking more sights is not practical, outliers should not be discarded unless a good reason presents itself as to why they should be discarded. The consequence may be a rather more open cocked hat or a fix of somewhat looser precision than one would like. But better that than discarding "bad data" and risk a false sense of security from the resulting tight fix."
To which you responded,
Once the slope has been calculated and the pattern of sights compared with it, it is up to the individual navigator to make the decision about the best place to place the slope amongst the sights. As much or as little weight can be accorded to any apparent outliers as you like.
Well, no Peter. You just can't deal with outlier data points "as you like" on a whim. If the data as a whole are to have value, it must be treated systematically. That is what you seem to fail to grasp.
And you concluded....
I've just given up resisting the temptation to add this (you can think of this personal weakness as a kind of reverse New Year's resolution):
Pourquoi faire simple lorsque, avec tellement peu plus d'effort, l'on peut faire compliquer...
Oh dear. Why this passion for writing in French on an English language forum Peter? What are you trying to prove? Take your own advice - writing it in English would have taken up less space and would mean we can all benefit from your erudite wisdom....
Happy New Year
Geoffrey Kolbe