[rfc-i] draft-iab-xml2rfc-latest, "5. Use of CDATA Structures and Escaping"

julian.reschke at gmx.de (Julian Reschke) Sat, 25 June 2016 09:36 UTC

From: julian.reschke at gmx.de (Julian Reschke)
Date: Sat, 25 Jun 2016 11:36:12 +0200
Subject: [rfc-i] draft-iab-xml2rfc-latest, "5. Use of CDATA Structures and Escaping"
Message-ID: <13475517-a799-a6c6-cee9-ca6237c48c6f@gmx.de>

<https://greenbytes.de/tech/webdav/draft-iab-xml2rfc-04.html#rfc.section.5>

There are a few minor problems with this section:

Title: the correct term is "CDATA Section", not "CDATA Structure"

"A common problem authors have with <artwork> and <sourcecode> elements 
is that the XML processor returns errors if the text in the artwork 
contains either the "&" or "<" character, or the string "]]>". To avoid 
these problems, the "&" and "<" characters may be escaped using the 
strings "&amp;" and "&lt;", respectively; the "]]>" string can be 
represented as "]]&gt;". Alternatively, they may be surrounded in a 
CDATA structure: "<![CDATA[]]>". For example:"

a) this applies to all elements, not only artwork and sourcecode

b) "XML processor returns error" -> "XML becomes invalid"

It might also be good to link to the XML spec section defining CDATA.

> Desired output:
>    allowed-chars = "." | "," | "&" | "<" | ">" | "|"
>
> Using escaping:
> <sourcecode>
>    allowed-chars = "." | "," | "&amp;" | "&lt;" | "&gt;" | "|"
> </sourcecode>
>
> Using CDATA:
> <sourcecode>
> <![CDATA[   allowed-chars = "." | "," | "&" | "<" | ">" | "|"]]>
> </sourcecode>

I'd format into three separate pieces of artwork.

"Using CDATA is not a panacea, but it does help prevent having to use 
escapes in places where using using escapes can cause other problems, 
such as difficulty of inclusion from other documents."

a) It should mention that you can't use CDATA to escape "]]>".

b) I don't understand the last part of the sentence -- what is the 
problem with inclusion from other documents? Is this about copy&paste? 
If so, maybe say so...

I'm attaching a potential patch, which would make the section become:

> 5.  Use of CDATA Sections and Escaping
>
>    A common authoring problem is the use of either the "&" or "<"
>    character or the string "]]>" in text content, as they need escaping
>    in XML.  This is particulary relevant in elements designed to contain
>    computer code or simple diagrams, such as <artwork> or <sourcecode>.
>    To avoid these problems, the "&" and "<" characters can be escaped
>    using the strings "&amp;" and "&lt;", respectively; the "]]>" string
>    can be represented as "]]&gt;".  Alternatively, they can be
>    surrounded in a CDATA structure: "<![CDATA[]]>" ([XML], Section 2.7).
>
>    For example:
>
>    Desired output:
>
>       allowed-chars = "." | "," | "&" | "<" | ">" | "|"
>
>    Using escaping:
>
>    <sourcecode>
>       allowed-chars = "." | "," | "&amp;" | "&lt;" | "&gt;" | "|"
>    </sourcecode>
>
>    Using CDATA:
>
>    <sourcecode>
>    <![CDATA[   allowed-chars = "." | "," | "&" | "<" | ">" | "|"]]>
>    </sourcecode>
>
>    Using CDATA is not a panacea (in particular, it doesn't allow
>    including "]]>" inside a single CDATA section), but it does help
>    prevent having to use escapes in places where using using them can
>    cause other problems, such as difficulty of inclusion from other
>    documents.

Best regards, Julian
-------------- next part --------------
Index: draft-iab-xml2rfc-latest.xml
===================================================================
--- draft-iab-xml2rfc-latest.xml	(Revision 2246)
+++ draft-iab-xml2rfc-latest.xml	(Arbeitskopie)
@@ -5731,34 +5731,42 @@
 </t>
 </section>
 
-<section title="Use of CDATA Structures and Escaping" anchor="cdata.and.escaping">
+<section title="Use of CDATA Sections and Escaping" anchor="cdata.and.escaping">
 
 <t>
-A common problem authors have with &lt;<x:ref>artwork</x:ref>&gt; and &lt;<x:ref>sourcecode</x:ref>&gt;
-elements is that the XML processor returns errors
-if the text in the artwork contains either the "&amp;" or "&lt;" character, or the string "]]&gt;".
-To avoid these problems, the "&amp;" and "&lt;" characters may be escaped using the strings
+A common authoring problem is the use of either the "&amp;" or "&lt;" character or the string "]]&gt;"
+in text content, as they need escaping in XML. This is particulary relevant
+in elements designed to contain computer code or simple diagrams, such as
+&lt;<x:ref>artwork</x:ref>&gt; or &lt;<x:ref>sourcecode</x:ref>&gt;.
+To avoid these problems, the "&amp;" and "&lt;" characters can be escaped using the strings
 "&amp;amp;" and "&amp;lt;", respectively; the "]]&gt;" string can be represented as "]]&amp;gt;".
-Alternatively, they may be surrounded in a CDATA structure: "&lt;![CDATA[]]&gt;".  For example:
-
-<figure><artwork>
-Desired output:
+Alternatively, they can be surrounded in a CDATA structure: "&lt;![CDATA[]]&gt;" (<xref target="XML" x:sec="2.7" x:rel='#sec-cdata-sect' x:fmt=","/>).
+</t>
+<t>
+For example:
+</t>
+<figure><preamble>Desired output:</preamble>
+<artwork type="example">
    allowed-chars = "." | "," | "&amp;" | "&lt;" | "&gt;" | "|"
+</artwork></figure>
 
-Using escaping:
+<figure><preamble>Using escaping:</preamble>
+<artwork type="example">
 &lt;sourcecode&gt;
    allowed-chars = "." | "," | "&amp;amp;" | "&amp;lt;" | "&amp;gt;" | "|"
 &lt;/sourcecode&gt;
+</artwork></figure>
 
-Using CDATA:
+<figure><preamble>Using CDATA:</preamble>
+<artwork type="example">
 &lt;sourcecode&gt;
 &lt;![CDATA[   allowed-chars = "." | "," | "&amp;" | "&lt;" | "&gt;" | "|"]]&gt;
 &lt;/sourcecode&gt;
-
 </artwork></figure>
-
-Using CDATA is not a panacea, but it does help prevent having to use escapes in places where using
-using escapes can cause other problems, such as difficulty of inclusion from other documents.
+<t>
+Using CDATA is not a panacea (in particular, it doesn't allow including "]]&gt;" inside a single CDATA section),
+but it does help prevent having to use escapes in places where using
+using them can cause other problems, such as difficulty of inclusion from other documents.
 </t>
 
 </section>