Re: Comments on MIME/SGML
"Daniel W. Connolly" <connolly@hal.com> Wed, 09 March 1994 00:46 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa13631; 8 Mar 94 19:46 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa13627; 8 Mar 94 19:46 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa27592; 8 Mar 94 19:46 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) id AA14573; Tue, 8 Mar 94 19:34:06 EST
Received: from hal.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) id AA14569; Tue, 8 Mar 94 19:34:02 EST
Received: from ulua.hal.com by hal.com (4.1/SMI-4.1.1) id AA07389; Tue, 8 Mar 94 16:33:31 PST
Received: from localhost by ulua.hal.com (4.1/SMI-4.1.2) id AA05155; Tue, 8 Mar 94 18:24:43 CST
Message-Id: <9403090024.AA05155@ulua.hal.com>
To: Ed Levinson <elevinso@accurate.com>
Cc: Multiple Recipients of List <ietf-822@dimacs.rutgers.edu>, MIME/SGML discussion group <mime-sgml@infoods.mit.edu>
Subject: Re: Comments on MIME/SGML
In-Reply-To: Your message of "Mon, 07 Mar 1994 17:10:54 EST." <9403072210.AA02671@Accurate.COM>
Date: Tue, 08 Mar 1994 18:24:42 -0600
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Daniel W. Connolly" <connolly@hal.com>
In message <9403072210.AA02671@Accurate.COM>, Ed Levinson writes: > >The essence of my proposal is to replace the "dtd" parameter with "prolog" >and to require both prolog and instance. The reason I suggest this >approach is practical, various implementations treat these two document >elements differently. Hmmm... my reasons were chiefly practial too; they were based on experience with the SGMLs package. Could you give some background (or pointers to materials I should read) about these "various implementations" that treat the prologue and the instance differently? >As to using text/sgml or application/sgml, I chose application to keep >within expressed boundaries others in the MIME community have >suggested. Namely, that text be reserved for very simple things. Formally speaking, it's a coin toss. I agree we should go with whatever precedents are out there. application/sgml is fine. I just don't like it when MH uses base64 encoding on my html body parts when I know most of my audience can read html source -- perhaps I just need to learn to use my tools better. >The correspondences you provided I like, it may be easier to explain >waht is happening using your table. I summarize it, with my own >suggestions, below. > >Do you find my proposal acceptable? It's acceptable, but I'm not sure it's optimal yet. Let's take another whack at the SGML->MIME correspondence: [I won't comment on the SDIF terms as I haven't read the SDIF standard.] > SGML: MIME: > notation (type) Content-Type: > SYSTEM indentifier Content-ID: > data entity Body Part I can't find the term "notation type" in the SGML standard. I have found: 4.75 data content notation: An application-specific interpretation of an element's data content, or of a non-SGML data entity, that usually extends or differs from the normal meaning of the document character set. and 4.213 notation identifier: An _external identifier_ that identifies a data content notation in a _notation declaration_. It can be a _public identifier_ if the notation is public, and, if not, a description or other information sufficient to invoke a program to interpret the notation. Also, "data entity" is not a term from the standard. We could use: 4.134 external entity: An entity whose text is not incorporated directly in an entity declaration; its system identifier and/or public identifier is specified instead. When I look at this closely, there's some redundancy: in SGML, the choice of notations is expressed in the ENTITY declaration along with the "filename" info. In MIME, the content type is expressed in the referenced body part. When using MIME/SGML, we have to put it in both places. Is the connection between SGML notiation identifiers and the MIME Content-Type syntax supposed to be explicit, or is there an implicit correspondence between a MIME content type and an SGML data content notation? For example, does the MIME content type show up explicitly in the NOTATION declaration, like this: --8< Content-Type: application/postscript Content-ID: id1 %!PS-Adobe... --8< Content-Type: application/sgml <!DOCTYPE T SYSTEM [ <!NOTATION ps SYSTEM "application/postscript"> <!ENTITY fig1 SYSTEM "id1" NDATA ps> ]> ... --8<-- or is it sufficient to write: --8< Content-Type: application/postscript Content-ID: id1 %!PS-Adobe... --8< Content-Type: application/sgml <!DOCTYPE T SYSTEM [ <!NOTATION ps PUBLIC "-//Adobe/PostScript" -- exact syntax? --> <!ENTITY fig1 SYSTEM "id1" NDATA ps> ]> ... --8<-- Hmmm... the implicit connection is probably more practical, but it introduces redundancy and the chance for errors. The explicit mapping causes the namespace of SYSTEM identifiers to include MIME content-types. Blech. > marked up text Application/SGML > document Multipart/SGML Up to here, we have been using terms from the standard. Your suggestion to introduce the term "marked up text" is a departure from what seemed like an otherwise elegant proposal. It's still well-defined, but in application-specific terms rather than in SGML standard terms. The question is whether practical considerations sufficiently motivate the departure. I suggested that we make the formal correspondence between the following: SGML entity body of Application/SGML body part c.f.: 4.284 SGML entity: An entity whose characters are interpreted as markup or data in accordance with this International Standard. The idea here is that MIME plays the role of entity manager, and MIME body parts map 1-1 to SGML entities. The first production in the standard is: [1] SGML document = SGML document entity (SGML subdocument entity | SGML text entity | non-SGML data entity)* You can't split the prologue and the instance across SGML entities. But you _can_ split the SGML document entity across system-specific objects: NOTES 1 This Internation Standard does not constrain the physical organization of the document within the data stream, message handling protocol, filesystem, etc., that contains it. In particular, separate entities could occur in the same physical object, a single entity could be divided between multiple objects, and the objects could occur in any order Using the example I originally sent, we had: SGML term or Content-ID: Contents App convention <10024.761615492.3@ulua> SGML document App <10024.761615492.4@ulua> external entity SGML <10024.761615492.5@ulua> SGML document entity SGML <10024.761615492.6@ulua> SGML text entity SGML <10024.761615492.7@ulua> SGML declaration App Your suggestion makes it look like: Content-ID: Contents <10024.761615492.3@ulua> SGML document App <10024.761615492.4@ulua> external entity SGML <10024.761615492.5@ulua> prolog App <10024.761615492.6@ulua> external entity SGML <10024.761615492.7@ulua> declaration App <10024.761615492.8@ulua> instance App But in the end, it's not really critical that SGML text entities map exactly to MIME body parts (even my proposal did app-specific stuff with the SGML declaration). [Hmmm... until you start talking about subdocument entities... I think a concrete example of this is in order.] The critical thing is how all this interacts with available (and conceivable) tools. For example, with either of the above examples, I could do mhn store cur and get several files: 4.sgml, 5.sgml, 6.sgml, ... After I replace system identifiers (SYSTEM "10024.761615492.6@ulua") with filenames (SYSTEM "6.sgml") in those files, I could validate the document using: sgmls -s 7.sgml 5.sgml # Connolly's version, or sgmls -s 7.sgml 5.sgml 8.sgml # Levinson's version Hmmm... about replacing system identifiers... this could be a _really_ tedious process. I wonder if we could get rid of this step somehow (with something like the original Content-Reference stuff?). Let's see... you could leave the SGML declaration body part alone. Then you have to process the other parts in the order they will be presented to the SGML parser... in fact, I think you have to parse them! Consider the following pathological case: foo.sgml: <!DOCTYPE T [ <!ELEMENT T - - ANY> <!ENTITY example SYSTEM "ex1.sgml"> ]> <T>blah blah, for example: <![ RCDATA [ &example; ]]> </T> ex1.sgml: <!ENTITY foo SYSTEM "fake-file"> All the characters in ex1.sgml are data, even though they look like markup. [AAARGH!!! My X server just died and emacs lost my last 3 hours' work on this message!] Quickly, before I forget: * As it stands, the MIME/SGML packer/unpacker cannot be implemented as an SGML layer over MIME or as a MIME layer over SGML -- it must be a piece of software that understands both simultaneously (see the above entity usage). I suggest that instead of messing with the SYSTEM identifiers in the data stream, we do an external mapping. Using the above example, the packer would write: Content-Type: multipart/sgml; boundary="xxx"; document="id2"; entity-map="id1" --xxx Content-Type: application/sgml-entity-map <id2> "foo.sgml" <id3> "ex1.sgml" --xxx Content-Type: application/sgml; name="foo.sgml" <!DOCTYPE T [ <!ELEMENT T - - ANY> <!ENTITY example SYSTEM "ex1.sgml"> ]> <T>blah blah, for example: <![ RCDATA [ &example; ]]> </T> --xxx Content-Type: application/sgml; name="ex1.sgml" <!ENTITY foo SYSTEM "fake-file"> --xxx-- For most cases, this makes the packer and unpacker trivial -- it works just like application/octet-stream. For cases where the sender's filenames can't be encoded in the MIME name parameter, or cases where the syntaxes of the sender and receiver's filesystems are different, the entity-map provides sufficient information to make the necessary translation. * The character set section of the MIME/SGML draft is overly brief and uses the nebulous term "ASCII." It should use the term US-ASCII, which is well-defined in the Internet community, and equate it to ISO-646-1983, which is the character set from the default SGML declaration. It should also give at least one complete example of using another charcter set (for example ISO-Latin-1 -- I tried for weeks to figure out how to spell that in SGML). * We need examples of usage of subdocument entities. I think this is another facter that motivates the mapping of an SGML document entity onto a single MIME body part (the alternative is to represent an SGML subdocument entity as another multipart/sgml body part, then extract the prologue and instance body parts, and concatentate them together -- then you have the subdocument entity. Workable, but clumsy...) * It's not clear how the single application/sgml body part works. The example given was: Content-Type: application/SGML; dtd="-//USA-DOD//DTD MIL-M-21742 911001//EN" <! ... an SGML instance > This implies an algorithm for producing an SGML document entity from a public identifier for a DTD and an instance. I don't quite see how to do this in general (what's the name of the DOCTYPE?). Dan
- Comments on MIME/SGML Daniel W. Connolly
- Re: Comments on MIME/SGML Tim Berners-Lee
- Re: Comments on MIME/SGML Daniel W. Connolly
- Re: Comments on MIME/SGML Ed Levinson
- Re: Comments on MIME/SGML Ed Levinson
- Re: Comments on MIME/SGML Ed Levinson
- Re: Comments on MIME/SGML Daniel W. Connolly
- Re: Comments on MIME/SGML Jim Conklin
- Re: Comments on MIME/SGML Steve Dorner
- Re: Comments on MIME/SGML Keith Moore