Re: [netmod] perfect extraction, tabs, and long lines - oh my
Ladislav Lhotka <lhotka@nic.cz> Mon, 24 September 2018 09:29 UTC
Return-Path: <lhotka@nic.cz>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C2AD130E78 for <netmod@ietfa.amsl.com>; Mon, 24 Sep 2018 02:29:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QJ4sduPWLTs9 for <netmod@ietfa.amsl.com>; Mon, 24 Sep 2018 02:29:07 -0700 (PDT)
Received: from trail.lhotka.name (trail.lhotka.name [77.48.224.143]) by ietfa.amsl.com (Postfix) with ESMTP id 910C5130E77 for <netmod@ietf.org>; Mon, 24 Sep 2018 02:29:06 -0700 (PDT)
Received: by trail.lhotka.name (Postfix, from userid 109) id CA9A118202DC; Mon, 24 Sep 2018 11:36:45 +0200 (CEST)
Received: from localhost (unknown [195.113.220.121]) by trail.lhotka.name (Postfix) with ESMTPSA id 50BF21820059; Mon, 24 Sep 2018 11:36:40 +0200 (CEST)
From: Ladislav Lhotka <lhotka@nic.cz>
To: Kent Watsen <kwatsen@juniper.net>, Robert Wilton <rwilton=40cisco.com@dmarc.ietf.org>, tom petch <ietfc@btconnect.com>, Bob Harold <rharolde@umich.edu>, "adrian@olddog.co.uk" <adrian@olddog.co.uk>
Cc: "netmod@ietf.org" <netmod@ietf.org>
In-Reply-To: <2EBB9A0D-66C3-4116-99A5-C6D4BD290695@juniper.net>
References: <2EBB9A0D-66C3-4116-99A5-C6D4BD290695@juniper.net>
Mail-Followup-To: Kent Watsen <kwatsen@juniper.net>, Robert Wilton <rwilton=40cisco.com@dmarc.ietf.org>, tom petch <ietfc@btconnect.com>, Bob Harold <rharolde@umich.edu>, "adrian\@olddog.co.uk" <adrian@olddog.co.uk>, "netmod\@ietf.org" <netmod@ietf.org>
Date: Mon, 24 Sep 2018 11:28:59 +0200
Message-ID: <8736tz6y78.fsf@nic.cz>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/65AD7XXtP9kecXadNqUxvjsDmB4>
Subject: Re: [netmod] perfect extraction, tabs, and long lines - oh my
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2018 09:29:10 -0000
Hi, it is quite funny: we are using XML in an area that it wasn't really designed for - representation of hierarchical data - but, on the other hand, we *don't* use XML where it could effectively help us avoid awkward formatting and extraction problems such as those discussed in this thread. Since the beginning, I've been writing YANG source in the YIN XML notation (and most people find it weird). However, when one uses a schema-aware editor such as nxml mode of emacs, writing YIN is really a pleasant experience: - no need to remember YANG syntax, all statements are autocompleted in a context-sensitive way - no need to care about the prescribed order of substatements (for the pyang --ietf check) - (mostly) no need to care about long lines and whitespace. A schema-aware editor takes care about the first item and XSLT scripts about the rest. All the tools are available in my GitHub skeleton project https://github.com/llhotka/YANG-I-D And as for extracting YANG modules from I-D text: given that I-D submission format is XML (xml2rfc), the most natural way for including YANG modules in an I-D would be to use YIN directly instead of <artwork>. Both formatting and extracting the module would then be absolutely painless - it is a different XML namespace, so tools should have no problems with it. I suspect I won't attract many supporters but I couldn't help myself. It bothers me that we have all the tools and technologies available but - because the general opinion is that they are hard to use - we instead struggle with brittle and tricky technicalities like those below. Isn't it much harder after all? Lada Kent Watsen <kwatsen@juniper.net> writes: > [new subject line] > > It is one thing for an editor to use tabs during the creation of text, > and another to publish text with an expectation that consumers will > render the tabs the same way. Either the source editor converts tabs > to spaces, which is interoperable today, or keep the tabs while > publishing metadata in the text, using some TBD standard, enabling > consumers to use the same tab stops. > > If there were a standard enabling the publishing of text including > tabs, it should work for all artwork, not just artwork that has been > folded. This is similar to the discussion we had before about having > begin/end markers enabling perfect extractions, in that it is also > something that pertains to all artwork, not just artwork that has > been folded. > > Thus, there are a total of three problems: > P1: perfect extraction > P2: tabs > P3: long lines > > Assuming all thing were solved problems, and assuming that we always > want perfect extraction, the possible combinations for the occurrence > of the other two problems are: > - no tabs or long lines > - tabs, but no long lines > - long lines, but no tabs > - tabs and long lines > > How are they ordered? Clearly supporting perfect extractions has > to be the outermost thing, but what about the other two? Does it > matter? > > Thinking about solutions: > > - the solution for long-lines is to use a header (not a footer) > because it's believed important to prime readers *before* they > read the text. > > - the solution for perfect-extraction could be either: > - use both a header-and-footer marker (low tech) > - or use either a header or a footer that encodes > something like a "num lines" value into the > marker. (note: footer-only okay since the marker > is for programmatic processors, not the readers) > > - the solution for tabs could be to use either a header > or a footer that encodes the tab- stop metadata. (note: > footer-only okay since the marker is for programmatic > processors, not the readers) > > > If tabs were to be supported by the folding solution (note: it > doesn't make sense to talk about "folds being supporting by the > tabbing solution"), then either: > > a) tabs are handled *before* folding, and the folding-solution > is aware of the tab-solution (i.e., it is able to process > the metadata). > > - everybody nods ;) > > b) the folding-solution is really a folding+tab solution, that is, > it has a built-in way of handling tabs (i.e., encoding tab stop > metadata) independent of how tabs are handled for text that has > not been folded. > > - this may be technically possible, but we should avoid having > two solutions to solve the tab problem. We would be better > off solving the tab-problem directly and then use (a). > > c) the folding-solution folds using the source tab stops, but does > not itself encode metadata about the tab stops, assuming that > there is a "promise" that the encoding of the metadata will > occur in a wrapper layer around it. > > - this feels icky, but it seems viable and, would possible > allow us to proceed with this draft without having to solve > the tabbing problem now. > > > Options: > > 1) RFC disallows TABS in both the source-input and folded-output. > ***This is what we currently have*** > > 2) RFC disallows TABS only in the folded-output, per RFC 7991, > leaving it to the folding-logic (the script) to decide if it > wants to: > a) disallow TABS in the source input (curr script does this) > b) detect TABS exist and prompt user for TAB stop info > c) detect TABS and query environment for cur TAB stop info > (but tab-stops may differ in the shell the text editor, > or whatever was used to create the text, right?) > > 3) RFC allows TABS, and solves it by depending on a tab-solution, > as described by (a). > > 4) RFC allows TABS, but does not solves it, as described by (c). > This would probably NOT be allowed from a standardization > perspective. > > > Moving to (2) would be easy and probably resolves most concerns > here. > > Moving to (3) is possible, but we would do so only to: > > - support non-IETF use cases > > - or pave the way for an rfc7991bis that could depend on the > solutions we define here. > > That is, rfc7991bis could *allow* long-lines and tabs while > `xml2rfc` applies the solutions being discussed here only > for when exporting the "plain-text" format (other formats > may have better ways to support perfect extractions and/or > not care about long-lines or tabs). > > PS: as a corollary, realize that when we pre-textualizing > artwork for XML-based submissions, we are somewhat > worsening the result for other output formats (not > "plain-text"). > > > > Kent > > > _______________________________________________ > netmod mailing list > netmod@ietf.org > https://www.ietf.org/mailman/listinfo/netmod -- Ladislav Lhotka Head, CZ.NIC Labs PGP Key ID: 0xB8F92B08A9F76C67
- [netmod] perfect extraction, tabs, and long lines… Kent Watsen
- Re: [netmod] perfect extraction, tabs, and long l… Ladislav Lhotka
- Re: [netmod] perfect extraction, tabs, and long l… Kent Watsen
- Re: [netmod] perfect extraction, tabs, and long l… Robert Wilton
- Re: [netmod] perfect extraction, tabs, and long l… Juergen Schoenwaelder
- Re: [netmod] perfect extraction, tabs, and long l… Robert Wilton
- Re: [netmod] perfect extraction, tabs, and long l… Juergen Schoenwaelder
- Re: [netmod] perfect extraction, tabs, and long l… Martin Bjorklund
- Re: [netmod] perfect extraction, tabs, and long l… Kent Watsen
- Re: [netmod] perfect extraction, tabs, and long l… Juergen Schoenwaelder
- Re: [netmod] perfect extraction, tabs, and long l… Kent Watsen
- Re: [netmod] perfect extraction, tabs, and long l… Juergen Schoenwaelder