Welcome to the NavList Message Boards.

NavList:

A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

Compose Your Message

Message:αβγ
Message:abc
Add Images & Files
    Name or NavList Code:
    Email:
       
    Reply
    Rejecting outliers: was: Kurtosis.
    From: George Huxtable
    Date: 2010 Dec 31, 14:54 -0000

    The threadname is changed once again, from "kurtosis" (a mathematician's
    word far beyond the vocabulary of navigators, which displays Frank's
    erudition) to the more familiar "Rejecting outliers", which is what the
    discussion seems to be really about.
    
    I was trying to discover exactly what Peter Fogg himself was actually
    claiming his procedure could accomplish. Not what Frank Reed thought that
    it might accomplish, though those views may also be of some interest.
    
    And I used the word "magic" to describe that procedure, because nowhere,
    that I can recall, has Peter Fogg explained, in numerical terms that we
    might agree on (or otherwise) what his criteria are for accepting some
    observations and rejecting others. Which brought this response, from Frank-
    "Now come on, George. Magic?? I really believe that this attitude has made
    it nearly impossible for you to see something simple and useful."
    
    Oh? What is this "something simple and useful" that Frank believes my
    attitude has made it nearly impossible for me to see? Is it, I wonder, the
    virtue of plotting observations, to allow the practised eye to pick out
    oddnesses? Well, I'm all in favour of that, and can not recall any
    arguments I've made against it. As a one-time experimental physicist, such
    procedures have played a large part in my working life. And I see no reason
    why that should not apply to navigational procedures also. The human eye
    and brain can work together powerfully, and often provide a workable
    alternative to full mathematical analysis. What I have argued against are
    spurious claims that ascribe some exceptional qualities to those procedures
    that they do not, and cannot, possess.
    
    Now let's get on to the real nub of this discussion, the separation of
    "outliers" from what I will call useful data.
    
    I am aware that a Gaussian distribution is no more than a convenient
    approximation, representing observed scatter in measurements of many types,
    that seems to work well in practice. And there are many reasons why some
    observations might well lie outside an expected Gaussian error-band: they
    are commonly ascribed to some sort of "blunder". Such blunders can come in
    all sorts of unpredictable shapes and sizes, and it would seem impossible
    to predict any frequency-distribution for errors of that type. They would
    certainly corrupt any set of otherwise-valid measurements, and need to be
    detected and discarded, to the extent that is possible. That is the
    challenge that mariners face, to somehow distinguish the good from the bad.
    
    Frank writes-"In the real world, at least from every practical set of
    observations that I have seen, the probability of points "in the tails" of
    the distribution are much higher. For example, you might get a 3.6 minute
    of arc error one time out of a hundred observations or even one in fifty,
    or in other words, with hundreds of times greater frequency than the
    standard normal distribution would imply."
    
    Is that comment intended to apply just to sextant altitudes, or generally,
    to other fields of measurement as well, as seems to be implied? He appears
    to be challenging the very basis of error-theory, rooted as it is in the
    Gaussian distribution, which has provided a useful model for statisticians
    for many years. He is perfectly entitled to do so, but to be taken
    seriously will need to offer much firmer evidence than the anecdotal
    statements provided above.
    
    Frank then offers what he describes as "an easy way to model such
    observations" by combining two Gaussian distributions; one with a suggested
    standard deviation of 0.7', and another with a SD of 3.0". How does anyone
    use such a "model"? What is it based on? Where do any "blunders" fit in?
    How were the parameters (0.7',3.0', 80%) derived? Is it intended to
    represent real-life, perhaps Frank's own experience with measuring
    altitudes? Or has it just been imagined, dreamed up out of nothing? Facts,
    please.
    
    For one thing, it depends on whether all his fifty or a hundred
    observations have been made under comparable conditions. I imagine that
    most, or perhaps all, of Frank's were made from on land, but let me provide
    a maritime example which might well produce the sort of distribution he
    describes. Take a five-week ocean passage, in which benign weather has
    prevailed for four weeks of the five, resulting in a standard deviation in
    altitudes of 0.7'. But for one week it's been stormy, and over that week
    the SD has increased, to 3.0'. If we lump all observations together, over
    the five weeks, we will get a non-Gaussian distribution of the overall
    scatter. But that doesn't imply that on a calm day we are likely to see
    scatter in the region of 4'. In the same way, if we are to analyse a
    lifetime's experience of measuring altitudes, such measurements have to be
    assessed with some care,
    taking like with like.
    
    As for the "obsrevations where something has gone wrong but not at a level
    that we immediately detect. They're the sort of observations that we might
    occasionally mark down with a question mark or maybe just have a "funny
    feeling" about but they're not the sorts of observations that you would
    immediately throw it.". If there's an observation that you have a "funny
    feeling" about, or put a question mark against, the moment to discard it is
    there and then, at the time of the "funny feeling". Not wait to see if it
    fits in with your preconceptions or not, and then discard it if it doesn't.
    
    George.
    
     contact George Huxtable, at george{at}hux.me.uk
    or at +44 1865 820222 (from UK, 01865 820222)
    or at 1 Sandy Lane, Southmoor, Abingdon, Oxon OX13 5HX, UK.
    
    
    ----- Original Message -----
    From: "Frank Reed" 
    To: 
    Sent: Thursday, December 30, 2010 6:32 AM
    Subject: [NavList] Kurtosis WAS: errors in plotting and a possible/partial
    fix
    
    
    George H, you wrote:
    "Is Peter Fogg really claiming that he has a method which can reduce the
    error resulting from random scatter to less than simple averaging will do?"
    
    Yes. Of course, he is. SURELY that's obvious by now. And it's a simple
    method. It differs only slightly from the usual navigators' technique of
    omitting LOPs from a fix if they are too far out from a group of others.
    When you have a series of closely-spaced observations (well away from the
    meridian), the differences between the plotted observed altitudes and the
    line with the required slope is no more and no less than a plot of the
    intercepts of the sights. Of course any such method needs to be applied
    with some fixed a priori standards. Otherwise the temptation to fit the
    line will become too great.
    
    And George, you wrote:
    "If so, I can always produce sets of simulated data, which are affected
    only by computer-generated random scatter, on which he can try his magic,
    to substantiate that claim."
    
    Now come on, George. Magic?? I really believe that this attitude has made
    it nearly impossible for you to see something simple and useful.
    
    You also wrote:
    "I understood that his reason for declining such trials, when last offered,
    was that that his procedures could not be expected to improve on such
    Gaussian scatter, but could only improve on non-Gaussian outliers. If I'm
    wrong about that, the offer remains open."
    
    Of course this is the issue. Gaussian distributions are only an approximate
    model of real observational error, excellent as a starting point, in fact a
    gold standard for a starting point, but only part of the story. What we
    have here is "kurtosis".
    
    Kurtosis (positive kurtosis, to be precise) is a ponderous name for a
    simple phenomenon in observations: you get more outliers than a pure
    Gaussian distribution would imply. And most people who have done
    observations with manual instruments are familiar with this phenomenon
    though they rarely have a name for it. For a navigation example, suppose
    you have a navigator who has a standard deviation of Sun altitude sights of
    0.9 minutes of arc. That's not an unreasonable number. It implies that
    roughly two-thirds of observations (actually 68%) are within 0.9 minutes of
    arc of the truth. But the standard normal distribution tails off very
    rapidly. This means that the odds of finding an observation at three or
    four standard deviations away from the truth are extremely low --by this
    THEORETICAL model of the error distribution. Specifically, the odds of an
    observation at 3 s.d. with an error of +/-2.7 minutes of arc, or more, are
    about 1-in-370 --for a Gaussian normal distribution. The odds of an
    observation at 4 s.d. with an error of +/-3.6 minutes of arc or more are
    about 1-in-16,000. That number implies that you could shoot Sun altitudes
    five times a day, every day of the year, for over eight years and still
    only have an even-money chance of seeing an observation with an error of
    3.6'. But that is not the reality of sextant observations. The normal
    distribution is a model with zero kurtosis. In the real world, at least
    from every practical set of observations that I have seen, the probability
    of points "in the tails" of the distribution are much higher. For example,
    you might get a 3.6 minute of arc error one time out of a hundred
    observations or even one in fifty, or in other words, with hundreds of
    times greater frequency than the standard normal distribution would imply.
    That's called "kurtosis" (for those who like even more arcane terminology,
    it is technically a "leptokurtic" distribution).
    
    If you want to model observations that have kurtosis, there is an easy way
    to do it, and it has a direct relationship with the origins of these
    "outliers" in the real world. Generate random variables as follows: with
    some probablity f (e.g. 80%) take random numbers from a Gaussian normal
    distribution with a relatively small standard deviation. In the case here,
    we might take 80% of numbers from a normal distribution with standard
    deviation 0.7'. These correspond to normal "good" observations. For all
    other simulated observations (necessarily with probability 1-f, of course),
    take the observations from a Gaussian distribution with a significantly
    larger standard deviation, perhaps 3.0' in the case described here. These
    correspond to obsrevations where something has gone wrong but not at a
    level that we immediately detect. They're the sort of observations that we
    might occasionally mark down with a question mark or maybe just have a
    "funny feeling" about but they're not the sorts of observations that you
    would immediately throw it. The random numbers you will get from this
    "mixed" simulation will generally resemble normally distributed numbers
    until you look more closely at the statistics, or until you employ some
    graphing technique like the very simple and efficient one that Peter Fogg
    has discussed many times. We can adopt a standard where we drop any
    observations greater than perhaps 2.5 s.d. from the sloping line, and we
    will get better results than a crude average of all points most of the
    time.
    
    This isn't magic. It's good science. Whether it's useful for a navigator
    depends on many factors: the type of observations (altitudes? lunars?), the
    quality of the observation conditions (small boat? land observer?), the
    time and calculating resources available (is a calculated plot available?),
    and probably more. Of course, one could also argue that this was never used
    historically so if we're only interested in the history of a dead skill,
    it's irrelevant. If there's any life left in traditional navigation,
    there's every reason to seek modern methods of analysis. There's nothing
    wrong with trying to cull outliers in observational data when there is
    significant kurtosis.
    
    -FER
    
    
    ----------------------------------------------------------------
    NavList message boards and member settings: www.fer3.com/NavList
    Members may optionally receive posts by email.
    To cancel email delivery, send a message to NoMail[at]fer3.com
    ----------------------------------------------------------------
    
    
    
    
    

       
    Reply
    Browse Files

    Drop Files

    NavList

    What is NavList?

    Get a NavList ID Code

    Name:
    (please, no nicknames or handles)
    Email:
    Do you want to receive all group messages by email?
    Yes No

    A NavList ID Code guarantees your identity in NavList posts and allows faster posting of messages.

    Retrieve a NavList ID Code

    Enter the email address associated with your NavList messages. Your NavList code will be emailed to you immediately.
    Email:

    Email Settings

    NavList ID Code:

    Custom Index

    Subject:
    Author:
    Start date: (yyyymm dd)
    End date: (yyyymm dd)

    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site