[rfc-i] Imbedded XML in PDF/A3

jhildebr at cisco.com (Joe Hildebrand (jhildebr)) Wed, 29 October 2014 20:17 UTC

From: "jhildebr at cisco.com"
Date: Wed, 29 Oct 2014 20:17:37 +0000
Subject: [rfc-i] Imbedded XML in PDF/A3
In-Reply-To: <544FE8ED.6050209@nostrum.com>
References: <544FE8ED.6050209@nostrum.com>
Message-ID: <9119AB44-335A-4CBD-9A69-43A5E74365DA@cisco.com>

On 10/28/14, 7:05 PM, "Robert Sparks" <rjsparks at nostrum.com> wrote:

>A comment both for the preservation draft and draft-hansen-rfc-use-of-pdf.
>
>When we embed the XML in the PDF, I suggest doing so without any 
>additional
>encoding or compression. Make it such that you can get to the XML with 
>cat, dd,
>or whatever filesystem recovery tool lets you extract a string of octets 
>and shove them
>into something that will treat it as UTF-8. Visually finding the 
>boundaries of the document
>if it's stored this way will not be difficult.

I'm working on doing this for HTML as well.  Since the XML might include 
comments (particularly while it's an I-D), and I want to embed this *in* a 
comment, I was thinking about base64-encoding.  I see the usefulness of 
the XML being more easily-discovered, however, so what about turning the 
XML comment markers from "<!--" to "<!- -"?  Most of the other hacks I've 
thought of would require &lt;-escaping the XML, which would be worse than 
base64.

-- 
Joe Hildebrand