Re: [Tools-arch] Document Production Methods

Jay Daley <jay@ietf.org> Tue, 01 September 2020 21:55 UTC

Return-Path: <jay@ietf.org>
X-Original-To: tools-arch@ietfa.amsl.com
Delivered-To: tools-arch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BA123A110C for <tools-arch@ietfa.amsl.com>; Tue, 1 Sep 2020 14:55:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Sy9ATxW2I_3K; Tue, 1 Sep 2020 14:55:41 -0700 (PDT)
Received: from jays-mbp.localdomain (unknown [158.140.230.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPSA id 8E1443A110B; Tue, 1 Sep 2020 14:55:40 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Jay Daley <jay@ietf.org>
In-Reply-To: <2062f219-5b09-46aa-bf47-ea55f287e142@www.fastmail.com>
Date: Wed, 02 Sep 2020 09:55:38 +1200
Cc: tools-arch@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <4C30E7BC-F0F1-495A-9490-675BB3D2407A@ietf.org>
References: <5841e863-80b8-42a8-a989-98bd1ad8e837@www.fastmail.com> <BBE0CB97-6495-4B42-8A9B-5B2CC3226782@ietf.org> <2062f219-5b09-46aa-bf47-ea55f287e142@www.fastmail.com>
To: Martin Thomson <mt@lowentropy.net>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-arch/VD-JbPt3yq3AWCDG5-MFes4A9xU>
Subject: Re: [Tools-arch] Document Production Methods
X-BeenThere: tools-arch@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Tools Architecture and Strategy Team <tools-arch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-arch>, <mailto:tools-arch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-arch/>
List-Post: <mailto:tools-arch@ietf.org>
List-Help: <mailto:tools-arch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-arch>, <mailto:tools-arch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Sep 2020 21:55:43 -0000

I’m not aware of markdown supporting semantic markup or anything more than basic layout markup (excluding embedded HTML).  Absent that I can’t see any alternative to XML for that stage.  That’s not to suggest that XML is the best for the other stages (it isn’t).

One important thing to note is that plain (standard?) markdown is far from sufficient to write an IETF document.  kramdown-rfc uses a mixture of markdown, yaml and custom extensions, while mmark adds the following list of extensions on top of an already extended markdown processor:

> 	• TOML titleblock.
> 	• Including other files.
> 	• More enumerated lists and task-lists.
> 	• Table and codeblock captions.
> 	• Quote attribution (quote "captions").
> 	• Table footers, header and block tables.
> 	• Subfigures.
> 	• Inline Attribute Lists.
> 	• Indices.
> 	• Citations.
> 	• Abstract/Preface/Notes sections.
> 	• Parts.
> 	• Asides.
> 	• Main-, middle- and backmatter divisions.
> 	• Math support.
> 	• Example lists.
> 	• HTML Comment parsing.
> 	• BCP14 (RFC2119) keyword detection.
> 	• Include raw XML references.
> 	• Abbreviations.
> 	• Super- and subscript.
> 	• Callouts in code blocks.

This suggests that what matters to the IETF first and foremost is the document schema (https://github.com/rfc-format/v3grammar) in which case it may be useful think about a standard subset of the full schema that is all that is needed for stage 2 and a standard way of supporting that in any markdown+ tool.

Jay


> On 1/09/2020, at 5:21 PM, Martin Thomson <mt@lowentropy.net> wrote:
> 
> That's a fair summary.
> 
> I would say however that you might be overstating the XML thing.  Those who designed the v3 format might believe that it is all needed, but I am less certain of the value of it all.  What is true is that this is what we have.
> 
> What I think is important to preserve across conversions is the text.  The dressings (boilerplate, use of <aside> rather than whatever formatting convention you had, adding <bcp14> tags, and all that mess) aren't what are being negotiated.  Edits of the language for clarity have value, but dressing it up in XML isn't likely to result in an artifact that has significantly more inherent value than the input.
> 
> As long as down-conversion retains text, and the nature of what is erased is understood, then we probably don't lose much by doing that.
> 
> On Tue, Sep 1, 2020, at 13:52, Jay Daley wrote:
>> If I take what you’ve written and rearrange it in a different way, you 
>> are describing three different stages of document authoring each of 
>> which has a different use case and each use case would be better 
>> supported by a different format.
>> 
>> # Stage one - initial authoring
>> This is the initial production of the document either by one person or 
>> a small group, which only requires basic revision tracking and 
>> comments.  As you say, that is better suited to Google Docs or 
>> something similar for fast, easy collaboration.
>> 
>> # Stage two - group text revision
>> This the laborious process of multiple contributors working with 
>> changes across the document.  Again, as you say, that works best with 
>> something as close to plain text as possible, such as markdown as the 
>> markup only gets in the way and XML is so much markup.
>> 
>> # Stage three - pre-publication preparation
>> This consists mainly of filling out boilerplate sections (contributors 
>> etc), adding semantic annotation and layout control, with text changes 
>> focused on clarity/language etc not substance.  The moment we get into 
>> semantic annotation and layout control we need complex markup such as 
>> XML.  The boilerplate stuff and text cleanup could be done in stage 
>> two-point-five before the XML comes into it.
>> 
>> 
>> Automated format translation from one stage to the next higher stage is 
>> easy provided conventions are followed in both stages one and two, 
>> which is easy enough with templates.  
>> 
>> Translating a document back a stage (your XML to markdown convertor) is 
>> where more thought is required both in terms of what processes this 
>> will be used for as well as the technical side.  For example, is this 
>> intended to be a one-way translation so that any semantic and layout 
>> markup is lost, or is it meant to be a reversible process where the 
>> markdown is "extracted" from the XML, then edited and the "reinserted" 
>> into the XML?  A simple XML to markdown convertor is not hard [1], 
>> while the two way is more complex but still doable.
>> 
>> Jay
>> 
>> [1]  In my current role I don’t do technology, but when I did, XML to 
>> […] conversion was something I did 
>> https://github.com/JayDaley/XML-to-JSON-in-XSLT
>> 
>> 
>>> On 31/08/2020, at 6:09 PM, Martin Thomson <mt@lowentropy.net> wrote:
>>> 
>>> There has been a bunch of discussion recently on the rfced-future list about the tools that people use to produce documents.  It seems like this is one of the major areas in which we could provide some input.  Here's my initial suggestion.
>>> 
>>> 
>>> # Production Methods
>>> 
>>> The point of this document is to examine how the tools we use to produce a document influence the style of collaboration we use.  And to examine where there are gaps in the process that might be addressed with better tooling.
>>> 
>>> ## Straight to XML
>>> 
>>> This is how I used to operate.  It is an tolerable choice, but not a great one.  XML is fiddly to the point of being user-hostile.  Add to that the (recent) instability in the format and I would argue that this is not a good format to author documents in.
>>> 
>>> I understand that this is (now) how the RPC operates, so this is an important option to consider.  It is possible that, due to our choice of publication format, that this remains important as the single place on which we can focus attention.
>>> 
>>> For specialized tooling, this might be the best format to concentrate on.  For instance, tools that validate code fragments could target the XML format and rely on conversion to XML from other formats.  This isn't perfect as, for instance, none(?) of the conversion tools preserve mappings to line numbers in source, so finding errors can be frustrating.  However, being able to target a single format for a tool can save on development effort.
>>> 
>>> ## Markdown
>>> 
>>> I'm aware of two variants here (kramdown-rfc2629 and mmark), both of which are easy to use and provide native support for all the important parts of a functional document.  There are some weak points, but they are quite minor.  The overall experience of authoring in markdown is excellent.
>>> 
>>> This is how I currently recommend that people work on documents.  You might not always *start* here (see below), but you should probably spend most of your time here.  I find that that most documents can start and end here anyway.
>>> 
>>> The main open question I have is what level of support is provided once the collaborative portion of the process ends and the document moves to editing and publication.  To my mind, the ideal situation is that the original authoring format is retained and final edits are done to that format.  This would allow for working groups to pick up something that is close to the final publication form and produce a revision.
>>> 
>>> This is not the model the RPC currently adopts.  Only XML is accepted for editing.  If we want to allow for a single set of tools for things like spell checking, this might be the right answer.  However, maintaining continuity for editors is important, and editors that want markdown should be provided that option.  A converter from XML to markdown might be all that is needed.  Having perfect down-conversion is possible, but likely to be unworkable.  An imperfect conversion might be OK provided that there is no loss of semantic information, or - worst case - lost semantics could be logged for manual examination.
>>> 
>>> ## Microsoft Word/Google Docs
>>> 
>>> Producing content collaboratively using Word or Docs is pretty damned good.  Inline comments and suggestions are two features I rely on a massive amount in my day-to-day work.  Both tools do very well at this.
>>> 
>>> However, I don't think that we need a tool that has this level of fast iteration capability.  This style of interaction doesn't scale out very well, so I believe that it is only useful during early stages of document production.  That is, when there is a small group of people who are collaborating on an initial revision of a document.  In that case, the power that these tools provide is of great use.  Commentary and interactive editing are hugely useful at this stage of the process.  
>>> 
>>> You might also consider these tools for design teams or collaborating on small edits to an existing document.
>>> 
>>> What I have found in the past is that producing a first draft of a document using these tools is great.  Once more open collaboration is required, it is better to have a textual artifact in a revision control system (see RFC 8874).
>>> 
>>> Revision control and issue tracking in particular allow you to better control changes and - as the process evolves - give you much better visibility into what has happened and why.  Using a text-based format is a precondition of getting support from these tools.
>>> 
>>> For transfer into markdown, there are online tools that can convert a simple Word document (in .doc or .docx format) into markdown.  https://word2md.com uses https://github.com/benbalter/word-to-markdown, but there are others.  This needs a little massaging to cleanup, which could be mechanized if this happens often, but the changes are usually small.  I know of one case where an extension to Google Docs was written to do this conversion.  The first draft of RFC 8752 was produced using this method and it worked very well.
>>> 
>>> Change tracking is a feature I've seen used a lot in other bodies that exclusively use Word.  I don't think that this feature works especially well relative to the text-based diffs we routinely use.
>>> 
>>> ### Microsoft Word Template
>>> 
>>> Joe Touch maintains a world template that produces text in the paged text form of yore.  I think that this results in a terrible process.  Even if it were to able to produce XML, it does not produce an artifact that can be collaborated on easily.  
>>> 
>>> Once you beyond a small group of collaborators, Word is not as good as textual formats at managing collaboration.  Authoring in Word at best follows a collaboration model where you have a single editor (or, if you are especially lucky, a tight editor team) and a group of supplicants who can ask the editor to make changes.  I've seen this form of process in other groups (3GPP and OMA in particular) and it's not the sort of standards collaboration I would wish on anyone.  There is a strong tendency for editors to become gatekeepers anyway, but this reinforces the privilege and control of that position as every change needs to be touched by an editor.
>>> 
>>> People also need to have accounts with the online service in order to collaborate effectively, which has proven to be a real barrier in practice.
>>> 
>>> ## Other Tools
>>> 
>>> There is a long tail here, obviously.  The list of "others" includes nroff, outline.el, and others I am just not aware of.  I have no idea how widespread any of these tools are, but unless presented with evidence that suggests lots of use, I don't think we need to entertain these.
>>> 
>>> My suggestion is that those that can produce XML should do so promptly.  Those that can only produce text, like the Word template, might find a path to markdown easier.
>>> 
>>> # Recommendation
>>> 
>>> The overriding theme here is how easy it is to collaborate.  In the context of the IETF processes, that is to achieve consensus around the content of a single artifact.
>>> 
>>> I would argue that markdown is the best tool we currently have for this.  First-class support for markdown in our processes is something we should aim for.  Many of the utilities that support markdown will also support XML, and some people will produce XML, so a modest effort to support XML would be appreciated.
>>> 
>>> As textual formats, this allows us to use many of the same processes and tools that are available for source code.  This should be familiar to many of the people we hope will contribute.
>>> 
>>> For general tool support, targeting XML alone might be cheaper in terms of addressing the largest number of users.  But generic support for text-based formats would be ideal.
>>> 
>>> I am also going to deliberately suggest that no support is offered for other authoring formats.  We might respect the choice of individuals to use other formats for authoring, but those people cannot expect others to carry the cost of their choices.  That means providing encouragement to use the standard tools.
>>> 
>>> We might provide an on-ramp for people who want to do initial authoring in tools that are better suited to rapid collaboration.  But this should concentrate on as few options as possible.
>>> 
>>> # Potential New Tools
>>> 
>>> A tool that converts from XML to markdown would be a great addition to our toolset.  This would allow people to pick up XML documents that were produced from a variety of formats and restart collaboration in a format that is most accessible.
>>> 
>>> A tool that converts from text to XML or markdown would be the only other addition we might consider.   I don't know if one of these exists already.  Such a tool might exist, it's fairly easy to script something.
>>> 
>>> Improving support for conversion from word to markdown would be helpful for those who want to use Microsoft Word or Google Docs.  This might also help get people migrate from the Word template as well.
>>> 
>>> -- 
>>> Tools-arch mailing list
>>> Tools-arch@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tools-arch
>>> 
>> 
>> -- 
>> Jay Daley
>> IETF Executive Director
>> jay@ietf.org
>> 
> 
> -- 
> Tools-arch mailing list
> Tools-arch@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-arch

-- 
Jay Daley
IETF Executive Director
jay@ietf.org