Re: [netmod] artwork folding: dual support modes?

Kent Watsen <kent+ietf@watsen.net> Sun, 24 March 2019 20:39 UTC

Return-Path: <01000169b16e4b59-ef36f4d2-41a1-4695-b06d-e312d90d801d-000000@amazonses.watsen.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2ACA6120071 for <netmod@ietfa.amsl.com>; Sun, 24 Mar 2019 13:39:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=amazonses.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mBDvUHofET1k for <netmod@ietfa.amsl.com>; Sun, 24 Mar 2019 13:39:51 -0700 (PDT)
Received: from a8-32.smtp-out.amazonses.com (a8-32.smtp-out.amazonses.com [54.240.8.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0542C120005 for <netmod@ietf.org>; Sun, 24 Mar 2019 13:39:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1553459989; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:Feedback-ID; bh=cLWbfODe7hwONwUbn/YK9pnL0ig3gidhQ1cfMkDCFLo=; b=dzT9FMm4OMCr1KWDDTLZ3+5+QV6Sx5Zo4DQrvwnkDgDUxir5697X0YVcHl/nZLaZ VAOiciwDQi0dtwJ32NgmqTUcVYz1y0633z6+CkS7/eW18QtHbrihKPB8KXeoBqWNfnm PN7vUAdjU3C9BnLZE4gM3JhW15TpzKlg/N8/UN+4=
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (1.0)
From: Kent Watsen <kent+ietf@watsen.net>
X-Mailer: iPad Mail (16D57)
In-Reply-To: <010001694adcb594-c4949ed4-2ea4-403c-928f-cd2da66ddfd8-000000@email.amazonses.com>
Date: Sun, 24 Mar 2019 20:39:49 +0000
Cc: netmod@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-ID: <01000169b16e4b59-ef36f4d2-41a1-4695-b06d-e312d90d801d-000000@email.amazonses.com>
References: <0100016949d802d6-ccf713c5-df75-4f24-b479-4bc94b4138ec-000000@email.amazonses.com> <20190304.193540.1020759172873811211.mbj@tail-f.com> <010001694a9bd6f5-87034dff-c252-4e16-8028-f38e9184d2da-000000@email.amazonses.com> <20190304.225223.810570484724895529.mbj@tail-f.com> <010001694adcb594-c4949ed4-2ea4-403c-928f-cd2da66ddfd8-000000@email.amazonses.com>
To: Martin Bjorklund <mbj@tail-f.com>
X-SES-Outgoing: 2019.03.24-54.240.8.32
Feedback-ID: 1.us-east-1.DKmIRZFhhsBhtmFMNikgwZUWVrODEw9qVcPhqJEI2DA=:AmazonSES
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/48S2OM_B4W0yOGqXMbqAqyo63XU>
Subject: Re: [netmod] artwork folding: dual support modes?
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Mar 2019 20:39:53 -0000

I’ve been thinking about Martin’s desire for the pretty single backslash approach.  I think the discussion should be about the probability of collisions (I.e., files that cannot be folded due to false positives)

Assuming 95 printable characters (127-32) plus ‘\n’, a total of 96 characters, the unconditioned probability of a given of an n-character string occurring on a line is (1/96)^n.

Here are the 3 options being discussed (please check my math!)

1. The pretty double backslash approach in the current draft
       - folds only on any column and supports indents

Scan the text content to ensure no existing lines already end with a backslash ('\') character when the subsequent line starts with a backslash ('\') character as the first non-space (' ') character.

    P(“\\\n[ ]*\\”)
      = sum of (1/96)^n for 3<=n<69
      = ∑((1÷96)^x; x; 3; 69)
      = 0.0000011421783625730994152046783625731
      ~= 1 / 1,000,000

2. The not pretty single backslash approach in I-D -06
       - folds only on max-column with no support for indents

Scan the artwork to ensure no existing lines already end with a '\' character on the desired maximum column.

     P(“.\{$maxcol-1\}\\\n”)
       = P( (not ‘\n’ for $maxcol-1 chars), followed by a “\\\n”)
       = ((1−(1÷96))^68)×(1÷96)^2
       = 0.0000532376463396105463857306859461496
        ~= 1 / 20,000

3. Martin’s pretty single backslash approach
       - folds only on any column and supports indents

Scan the artwork to ensure no existing lines already end with a '\' character OR that a white space character appears on the max column.

      P(“\\\n”) + P(“.\{$maxcol\} “)
        = (1/96)^2 + ((1−(1÷96))^69)×(1÷96)
        = 0.00516608334670744635108885960932866
        ~= 1 / 200

Note that each of these assume a long-line.  The number of long lines in a given piece of text is small, 1/100?  Thusly, while option #3 is two orders of magnitude more like than option #2, it may only be detected 1 / 200,000 text samples, that are themselves detected as needing to be folded (1/5?), so maybe one in a million text samples?

Maybe an automated folding algorithm could try #3 and, only if detecting the precondition, switch to option #1?

Kent // contributor