Re: [Rfcplusplus] Sunk cost + not about us

John C Klensin <john-ietf@jck.com> Tue, 10 July 2018 17:01 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: rfcplusplus@ietfa.amsl.com
Delivered-To: rfcplusplus@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A0D913114C for <rfcplusplus@ietfa.amsl.com>; Tue, 10 Jul 2018 10:01:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tplytp2h4rZo for <rfcplusplus@ietfa.amsl.com>; Tue, 10 Jul 2018 10:01:40 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 739B9131142 for <rfcplusplus@ietf.org>; Tue, 10 Jul 2018 10:01:40 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1fcw1C-0005Xp-SJ; Tue, 10 Jul 2018 13:01:38 -0400
Date: Tue, 10 Jul 2018 13:01:32 -0400
From: John C Klensin <john-ietf@jck.com>
To: Richard Barnes <rlb@ipv.sx>
cc: rfcplusplus@ietf.org
Message-ID: <40A502D5A3E46151ADC19AF6@PSB>
In-Reply-To: <CAL02cgT5BtFnMHzxpAx7pV=AiRyzMQV3aON65kAPRnV9kFOgeg@mail.gmail.com>
References: <CAL02cgQbT8s0493SdbM7Gbw2ZiSV1kMHk+6=Z4BdC2Ky664CNg@mail.gmail.com> <d159dd1f-de0b-d6c5-6430-cd5577e266fd@joelhalpern.com> <dc8c30ee-8233-e5cc-3afd-4734c1af8b0b@gmail.com> <CAL02cgT5BtFnMHzxpAx7pV=AiRyzMQV3aON65kAPRnV9kFOgeg@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfcplusplus/wZXY0Wsno2u6du7Z8BkI7jEPhyE>
Subject: Re: [Rfcplusplus] Sunk cost + not about us
X-BeenThere: rfcplusplus@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: For discussion of the RFC++ BoF proposal and related ideas <rfcplusplus.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfcplusplus>, <mailto:rfcplusplus-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfcplusplus/>
List-Post: <mailto:rfcplusplus@ietf.org>
List-Help: <mailto:rfcplusplus-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfcplusplus>, <mailto:rfcplusplus-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jul 2018 17:01:42 -0000


--On Tuesday, July 10, 2018 10:34 -0400 Richard Barnes
<rlb@ipv.sx> wrote:

>...
> But in the real world, when you're trying to make a decision,
> you either work with dirty data or you collect better data.  I
> don't see anyone here doing any better survey work, so
> throwing out what data we have seems counterproductive.
>...

Richard, 

As someone who used to teach survey design, survey data
analysis, and threats to validity of such analyses, I beg to
disagree.  

To summarize a much longer tutorial... If one has data that are
known to reflect sample biases relative to the population of
interest (and some of this discussion leads me to believe we
don't even have consensus about what that population is) or
whose collection of questions cannot be clearly and
unambiguously interpreted, show biases toward certain types of
answers or are otherwise prone to confirmation bias, etc., then
the results are likely to be of little or no value at all.  

That is very different from "dirty data".  "Dirty data", while
not a precise technical term, usually refers to data whose
biases (statistical or substantive) we understand well enough to
at least partially de-bias or adjust for its inadequacies.

Bad data in which the biases are impossible to evaluate other
than knowing that they are present and potentially serious (and
as distinct from "dirty data" as above) may not only be worse
than "no data" but may be quite a bit worse than someone
standing up and saying "this is true because I said so".  At
least in the latter case, one has some reasonably chance of
guessing at the expertise, prejudices, preconceptions, and
general reliability of the source and calibrating the assertion
accordingly.

     best,
       john