[netmod] perfect extraction, tabs, and long lines - oh my

Kent Watsen <kwatsen@juniper.net> Fri, 21 September 2018 18:16 UTC

Return-Path: <kwatsen@juniper.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C8064130E99 for <netmod@ietfa.amsl.com>; Fri, 21 Sep 2018 11:16:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.711
X-Spam-Level:
X-Spam-Status: No, score=-2.711 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=juniper.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DTsgNQDH1SIT for <netmod@ietfa.amsl.com>; Fri, 21 Sep 2018 11:16:56 -0700 (PDT)
Received: from mx0b-00273201.pphosted.com (mx0b-00273201.pphosted.com [67.231.152.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EE276130E42 for <netmod@ietf.org>; Fri, 21 Sep 2018 11:16:54 -0700 (PDT)
Received: from pps.filterd (m0108161.ppops.net [127.0.0.1]) by mx0b-00273201.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w8LIEHUI019192; Fri, 21 Sep 2018 11:16:50 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=juniper.net; h=from : to : cc : subject : date : message-id : content-type : content-id : content-transfer-encoding : mime-version; s=PPS1017; bh=qIvoLSo8z7QiDM8gH4cNHPK7xQ7/hjcT+BRgx050XVY=; b=hmcS4oQChyXodL80z1HuUn8SefHx2dWGTVxgtWRtMRAXgTrY8wPZcnm99tbcJOLmJTUv OVkvp4QRme9JKliEuJp++XVHncnRc9D2KTz1bq8o0vWI5JpPWbD6ouPIgSGd6bB6J/PR tHDgq6M6prvAvgK/o4JdITYofp0DYVpklk0SO6T5u5IH8PGq6D8p0yL+ZcKq9nLhBt8z yCcpWegmBp8r6l/CJlHIlHekLXmJQ6JUw82iRL3frFmYNjlCV0FRxLGmBvDvPkB4uGKa 86jX+f7EH/SlVrsf5e/oRpKxiMuK04Zru9/79iKxxpjUjTtM+1TZyauuIiHeiAwKEt2I uw==
Received: from nam01-sn1-obe.outbound.protection.outlook.com (mail-sn1nam01lp0113.outbound.protection.outlook.com [207.46.163.113]) by mx0b-00273201.pphosted.com with ESMTP id 2mn35c0a4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 21 Sep 2018 11:16:50 -0700
Received: from DM6PR05MB4665.namprd05.prod.outlook.com (20.176.109.202) by DM6PR05MB4649.namprd05.prod.outlook.com (20.176.109.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.10; Fri, 21 Sep 2018 18:16:48 +0000
Received: from DM6PR05MB4665.namprd05.prod.outlook.com ([fe80::544a:dd4d:9524:9e6d]) by DM6PR05MB4665.namprd05.prod.outlook.com ([fe80::544a:dd4d:9524:9e6d%5]) with mapi id 15.20.1164.014; Fri, 21 Sep 2018 18:16:48 +0000
From: Kent Watsen <kwatsen@juniper.net>
To: Robert Wilton <rwilton=40cisco.com@dmarc.ietf.org>, tom petch <ietfc@btconnect.com>, Bob Harold <rharolde@umich.edu>, "adrian@olddog.co.uk" <adrian@olddog.co.uk>
CC: "netmod@ietf.org" <netmod@ietf.org>
Thread-Topic: perfect extraction, tabs, and long lines - oh my
Thread-Index: AQHUUddDmC2fPcIExUyr13bkTSDHbQ==
Date: Fri, 21 Sep 2018 18:16:48 +0000
Message-ID: <2EBB9A0D-66C3-4116-99A5-C6D4BD290695@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/f.20.0.170309
x-originating-ip: [66.129.241.11]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DM6PR05MB4649; 6:4kFYA/UDUHZW5XK7uIp+LeRdglX4QTzcbsl/n+BweW+N7NKsbobrZAgoY62SJD4xrc4TSZmkVta5jB0h3uf3Gtd8sfa77JZps9xIPLDEjlpUUhaPNCsVM4c2Uq2L4Bw6OZmjIBmQv+VGHrodFkJSHV6o8Lz4qBZMMdzosfXE6a35rFNVLhs5JnvOrsyVHVQqbCRlxLxNO//EMbZ7ZKSkX4+myavXKRD2jLRLKDf8e+/D5GnBFh/tCgLNPmLcbKdPpNjqXGOFe1D62ekLyVgGidg+mtcmluhDGS24lg/CWzufIDO+73nk9nGnLC+6PeCKcSmwc+F+gN6yZsa6Hve4PmBw5r2I4lgyJXoY6u5MNMYWlSREBSYCfIeqcOjXU1lC8nW3O1GObC6emNTSuPtPYS0fJjZONHOtycahNgpWxsaRu1/jFOaWlaU4p0qbYsSTuJYZPXF+/kS7NTfqYHriMA==; 5:t+HpkFMjNcQkQ+OL2KM5INfB25wW48B2BHEyJvlK1cZyBtn8/HNrrX+PRoxJ0cPmdYuMhxMR+30bk2rKDUOm3zOKwfNEVVXFSeMxB2UoHd+RCYgIDCUauPikNyd3wuUiR0s0e9jxC4hCRcWHAgH83Op3U5x0ad7GD4D7XTodSCk=; 7:L8lijqCJlsTrin0BeWldBwq3tU4yBXWnFXURlH8xXlsrEoVUzn0XPlgHq232L3+6DukQQomRyT9cP2F/r1Bw7F8ZpnAP+qnOrE0qSaghTPkPdJoIQyjOJBBDDC/y1wnNnUaAeYZultZy9qPRlrPCeOZeEoU/LRDK+lbpp3pY/g0Uxwei3RAbaqsUWt3tIus8KZ0lzzpYiRKGDjmUa4xG9YfHfpjAUgWisJ4rUAfmFCLx5EeOR2HbU4NSlsgGk5QJ
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-ms-office365-filtering-correlation-id: bb273167-d8c9-4964-4102-08d61fee659d
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:DM6PR05MB4649;
x-ms-traffictypediagnostic: DM6PR05MB4649:
x-microsoft-antispam-prvs: <DM6PR05MB4649C748DA4977C919B1E6AAA5120@DM6PR05MB4649.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3231355)(944501410)(52105095)(3002001)(6055026)(149027)(150027)(6041310)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(20161123564045)(201708071742011)(7699051); SRVR:DM6PR05MB4649; BCL:0; PCL:0; RULEID:; SRVR:DM6PR05MB4649;
x-forefront-prvs: 0802ADD973
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(136003)(39860400002)(396003)(346002)(366004)(376002)(199004)(189003)(5250100002)(2900100001)(26005)(25786009)(6116002)(3846002)(6512007)(476003)(2616005)(486006)(316002)(186003)(53936002)(7736002)(5660300001)(66066001)(86362001)(305945005)(97736004)(2906002)(99286004)(102836004)(8936002)(14454004)(33656002)(6436002)(478600001)(6506007)(68736007)(14444005)(256004)(36756003)(2171002)(2501003)(82746002)(8676002)(81156014)(81166006)(6486002)(106356001)(71190400001)(58126008)(105586002)(71200400001)(4326008)(110136005)(83716004); DIR:OUT; SFP:1102; SCL:1; SRVR:DM6PR05MB4649; H:DM6PR05MB4665.namprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
x-microsoft-antispam-message-info: D3rRqhyg9Bmxdo07GzY/jfmZj2ie3Gjzka1x+VDFwJljlLAATCE2EZJsrLS2LVa38sV5g4gg/ma+YBkH2ZinvaajPyaplVT3xAaCnqapVQm5P2Uo6aipt3KmiuS/c5aiFyQlGfZI2KMmkpuLInjA1BGE31pfYtNCfrtKGHEu9p4DcQQxkEaXB3s3ZY2wGPj5QJQ8S6x6H50GG13K3axyEcMhfyR4/ipkBZtwdnzshZ4qLpi6D+7jbJ+ey+vXW4r6l+Y/R+BhOt661NBZbmjk2otz5PEkX9Wcoa61Q8hKQOmrSNWkWoVVzceaXBd1AOKEQLbyuc3+2MX7tSS+Ev8m1erNIjU+VPzHPpUgYHAHKiM=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <E4FABB7FF4FB814C85B3FE57B6FC2167@namprd05.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-Network-Message-Id: bb273167-d8c9-4964-4102-08d61fee659d
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Sep 2018 18:16:48.6065 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR05MB4649
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-09-21_07:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=758 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809210178
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/VeH1J6SSYCgz8yBIWYLFzmRKoFI>
Subject: [netmod] perfect extraction, tabs, and long lines - oh my
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2018 18:17:03 -0000

[new subject line]

It is one thing for an editor to use tabs during the creation of text,
and another to publish text with an expectation that consumers will
render the tabs the same way.  Either the source editor converts tabs
to spaces, which is interoperable today, or keep the tabs while
publishing metadata in the text, using some TBD standard, enabling
consumers to use the same tab stops.  

If there were a standard enabling the publishing of text including
tabs, it should work for all artwork, not just artwork that has been
folded.  This is similar to the discussion we had before about having
begin/end markers enabling perfect extractions, in that it is also
something that pertains to all artwork, not just artwork that has 
been folded.  

Thus, there are a total of three problems:
  P1: perfect extraction
  P2: tabs
  P3: long lines

Assuming all thing were solved problems, and assuming that we always
want perfect extraction, the possible combinations for the occurrence
of the other two problems are:
  - no tabs or long lines
  - tabs, but no long lines
  - long lines, but no tabs
  - tabs and long lines

How are they ordered?  Clearly supporting perfect extractions has 
to be the outermost thing, but what about the other two?   Does it
matter?

Thinking about solutions:

 - the solution for long-lines is to use a header (not a footer)
   because it's believed important to prime readers *before* they
   read the text.

 - the solution for perfect-extraction could be either:
     - use both a header-and-footer marker (low tech)
     - or use either a header or a footer that encodes
       something like a "num lines" value into the 
       marker.  (note: footer-only okay since the marker
       is for programmatic processors, not the readers)

 - the solution for tabs could be to use either a header
   or a footer that encodes the tab- stop metadata. (note:
   footer-only okay since the marker is for programmatic
   processors, not the readers)


If tabs were to be supported by the folding solution (note: it
doesn't make sense to talk about "folds being supporting by the
tabbing solution"), then either:

  a) tabs are handled *before* folding, and the folding-solution 
     is aware of the tab-solution (i.e., it is able to process 
     the metadata).

      - everybody nods ;)

  b) the folding-solution is really a folding+tab solution, that is,
     it has a built-in way of handling tabs (i.e., encoding tab stop
     metadata) independent of how tabs are handled for text that has
     not been folded.

      - this may be technically possible, but we should avoid having
        two solutions to solve the tab problem.  We would be better
        off solving the tab-problem directly and then use (a).

  c) the folding-solution folds using the source tab stops, but does
     not itself encode metadata about the tab stops, assuming that
     there is a "promise" that the encoding of the metadata will
     occur in a wrapper layer around it.

      - this feels icky, but it seems viable and, would possible
        allow us to proceed with this draft without having to solve
        the tabbing problem now.


Options:

  1) RFC disallows TABS in both the source-input and folded-output.
     ***This is what we currently have***

  2) RFC disallows TABS only in the folded-output, per RFC 7991,
     leaving it to the folding-logic (the script) to decide if it
     wants to:
      a) disallow TABS in the source input (curr script does this)
      b) detect TABS exist and prompt user for TAB stop info
      c) detect TABS and query environment for cur TAB stop info
         (but tab-stops may differ in the shell the text editor,
         or whatever was used to create the text, right?)

  3) RFC allows TABS, and solves it by depending on a tab-solution,
     as described by (a).

  4) RFC allows TABS, but does not solves it, as described by (c).
     This would probably NOT be allowed from a standardization
     perspective.


Moving to (2) would be easy and probably resolves most concerns
here.  

Moving to (3) is possible, but we would do so only to:

 - support non-IETF use cases

 - or pave the way for an rfc7991bis that could depend on the 
   solutions we define here.  

   That is, rfc7991bis could *allow* long-lines and tabs while
   `xml2rfc` applies the solutions being discussed here only
   for when exporting the "plain-text" format (other formats
   may have better ways to support perfect extractions and/or
   not care about long-lines or tabs).

   PS: as a corollary, realize that when we pre-textualizing
       artwork for XML-based submissions, we are somewhat
       worsening the result for other output formats (not
       "plain-text").



Kent