Welcome to the NavList Message Boards.

NavList:

A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

Compose Your Message

Message:αβγ
Message:abc
Add Images & Files
    Name or NavList Code:
    Email:
       
    Reply
    Re: Rejecting outliers
    From: Peter Hakel
    Date: 2011 Jan 3, 11:37 -0800
    George,

    Yes, the weights are attached to each individual data point, reflecting how important that data point is allowed to be.  Imagine a large number of navigators, all making the same measurement simultaneously.  You calculate the mean and standard deviation of that data set.  Now you have your altitude (the mean) and the weight (1/standard deviation squared) for that given UT.  You repeat this for a set of UTs and now you are ready to perform the weighted least squares procedure.

    In reality, however, you usually only have one navigator producing a single altitude at any given UT.  That altitude can serve as the mean but the weights are unknown in this case.  This is where it gets heuristic; the true altitudes are expected to follow a linear trend during the short time interval of observation and we assume that the navigator is good.  Thus there is no appreciable systematic error in the data, only small random errors.  The standard deviation is then replaced with the absolute value of the difference between the mean (observed altitude) at that UT and a computed value (at that UT) from the best available linear fit.  This is what I called Step3. Since the fits depend on the weights, you have to iterate.  The procedure is initialized with the ordinary least-squares fit, where all weights = 1.

    Precomputing the slope requires DR information, which I chose to avoid using at the very beginning.  That admittedly makes it harder, because position tracking is easier (and preferable!) to position establishing.  It also, however, allows for the detection of an inaccurate DR, which also has value; e.g. after a enduring a long storm during which keeping accurate DR is difficult.  Furthermore, the computed slope should be corrected for vessel motion and the declination change (as it was in my processing for Gary LaPook's Venus data).  If those considerations give me "hard time" sitting at a desk behind a computer, while in the end making it easier for someone on a boat, that is a useful trade-off in my opinion.  If the DR constraint on the slope is inaccurate for whatever reason, I am certainly interested in "diverging" away from it if sextant and chronometer data from an experienced navigator suggest that we should.  I decided to rely less on the DR (whose accuracy could become suspect because of the weather) and trust more the set of measurements made under favorable conditions after the bad weather has passed.

    You interpreted the weighting scheme as a square-box modified by inverse-square shoulders.  Actually, the exact opposite is happening here; the weighting function STARTS as an inverse-square which is then capped by the ceiling controlled by the "Scatter" parameter.  This is designed to prevent any single data point from taking over the fit, or (even worse) getting a division by zero, should any fit iteration pass exactly through it.  This problem is an unwelcome side effect of the way the weights are estimated in the absence of "real" statistics at each UT.  The job of the "Scatter" parameter is to enforce the notion that EVERY data point comes with some uncertainty, which realistically cannot be smaller than a certain value. It overrides the heuristic Step3's suggestion of huge (or infinite!) weights 1/diff^2 arising from very small (or zero) | diff |.

    Now we come to the "magic" of selecting this "Scatter" parameter.  One could conceivably automatize the search for this value based on the data.  For that I would have to program the computer to find the "happy medium" by looking at the distribution of weights, which is what I did by turning this knob manually.  This "Scatter" parameter expresses highly individual and difficult to quantify "variables" such as the quality of the sextant, experience of the navigator, weather conditions, even how the navigator feels about that particular set of sights, things that I cannot anticipate when writing the spreadsheet.  ("This Scatter value eliminates sight #2, but is that really fair?") It is therefore not unreasonable, in my opinion, to leave the "Scatter" as a free parameter to be adjusted at the navigator's discretion on a case-by-case basis.  Note that I got good results both for Peter Fogg's data with Scatter=2.5' while for Gary LaPook's data (acquired under presumably more favorable conditions) I had 0.1'.

    The aim of this spreadsheet to is aid the navigator in averaging his or her sights with a method that has some good statistical justification behind it.  From the couple of tests on real-life data I have done so far, it seems to me that it does this job well.


    Peter Hakel




    From: George Huxtable <george@hux.me.uk>
    To: NavList@fer3.com
    Sent: Mon, January 3, 2011 3:46:22 AM
    Subject: [NavList] Re: Rejecting outliers

    Thanks to Peter Hakel for providing an unlocked version of his spreadsheet:
    I can now read it all, though without understanding exactly what it's
    doing.

    Perhaps Peter can clarify for me a point about his earlier posting, on 31
    December, in which he wrote-

    Eq(1):      weight = 1 / variance = 1 / (standard deviation squared)

    It may only be a matter of words, but it seems to me that the weight has to
    be assessed individually for each member of the set. Isn't "standard
    deviation" a measure of the set as a whole, not just a single member?
    Shouldn't Eq(1) read something like-
    "= 1/ (deviation squared)", not
    "= 1/ (standard deviation squared)"?

    Otherwise, I fail to follow it.

    I think he is giving himself an unnecessarily hard time by allowing the
    slope to be a variable in his fitting routine. What's more, he is diverging
    from an important prior-constraint of the problem, which is that the true
    slope must represent an altitude change of 32' over a 5 minute period, and
    NOTHING ELSE WILL DO. To that extent, his analysis is inappropriate to the
    problem.

    Knowing that variation with time, we can eliminate time from the problem
    before we even start to tackle it, by subtracting from each altitude value
    an amount that increases linearly with time, with a slope of 32',from some
    arbirary base-value, chosen for convenience.

    This then results in a set of nine simple numbers, of which the time and
    even the ordering is now unimportant. Peter's task then is to find some way
    of processing those numbers to determine a result that represents the true,
    unperturbed, initial value, better than a simple mean-of-9 does. In the
    case we're presented with, there's no evidence that the distribution is
    anything other than a simple Gaussian, which makes his task more difficult.
    If there were obvious "outliers", it could be more straightforward.

    Now for Peter's weighting function. In a simple least-squares analysis the
    weight given is the same to each observation, so the weight factor is a
    constant =1, whatever the deviation.

    If a limit is set, outside which data is excluded, it becomes a square
    distribution, around a best-estimate of a cenral value, within which the
    weighting is taken as 1, and outside it, either side of the centre beyond a
    deviation specified, for example, as 3 standard deviations, the weighting
    is zero. It's somewhat unphysical and arbitrary, but at least the
    conditions can be clearly specified.

    Peter modifies a square-box weighting function, as above, to add an
    inverse-square fall-off beyond its shoulders. Those sharp shoulders also
    seem semewhat unphysical. What I would like to follow is how the half-width
    between those shoulders relates to the standard deviation, and to his
    "scatter" parameter.

    It seems that Peter wishes to leave it to the individual, to choose the
    scatter parameter that's most appropriate to a particular data set, after
    viewing some results. If I understand that right, he hasn't yet eliminated
    all the "magic" from the operation.

    George.


       
    Reply
    Browse Files

    Drop Files

    NavList

    What is NavList?

    Get a NavList ID Code

    Name:
    (please, no nicknames or handles)
    Email:
    Do you want to receive all group messages by email?
    Yes No

    A NavList ID Code guarantees your identity in NavList posts and allows faster posting of messages.

    Retrieve a NavList ID Code

    Enter the email address associated with your NavList messages. Your NavList code will be emailed to you immediately.
    Email:

    Email Settings

    NavList ID Code:

    Custom Index

    Subject:
    Author:
    Start date: (yyyymm dd)
    End date: (yyyymm dd)

    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site