regarding your comments on proposed media type text/troff' to Informational RFC

moore at cs.utk.edu (Keith Moore) Fri, 15 April 2005 04:49 UTC

From: "moore at cs.utk.edu"
Date: Fri, 15 Apr 2005 04:49:53 +0000
Subject: regarding your comments on proposed media type text/troff' to Informational RFC
In-Reply-To: <200504140945.11376.blilly@erols.com>
References: <046F43A8D79C794FA4733814869CDF0749CA01@dul1wnexmb01.vcorp.ad.vrsn.com> <200504140945.11376.blilly@erols.com>
Message-ID: <909d3447e8e0759aa48dd277600db077@cs.utk.edu>
X-Date: Fri Apr 15 04:49:53 2005

[cc'ed to ietf-types list, which per RFC 2048 is where this document 
should be reviewed.]

Bruce,

I raise this concern both because this was something that was discussed 
extensively in the ietf-822 working group that created MIME, and also 
because we've seen far too many security holes result from giving the 
sender a way to choose what program to run on the recipient's system 
(usually by ignoring the content-type parameter and paying attention to 
the application-specific filename suffix).  Even if we don't think 
troff will be used much, it sets a bad precedent to allow a new 
content-type to specify something that we have ample experience to 
indicate it's harmful.

RFC 2046 (section 4.5.1) even includes the following language, which 
also appeared in RFCs 1341 and 1521:

>    To reduce the danger of transmitting rogue programs, it is strongly
>    recommended that implementations NOT implement a path-search
>    mechanism whereby an arbitrary program named in the Content-Type
>    parameter (e.g., an "interpreter=" parameter) is found and executed
>    using the message body as input.

now this is in a section related to the application/octet-stream 
content-type, but the principle is the same - if you let the sender 
choose what programs should be used to process the input, the 
content-type label is effectively ignored.  And the only difference 
between an "interpreter" parameter and a "process" parameter is that 
you're not only letting the sender specify the interpreter, you're 
letting him specify the entire command line!

I don't think it matters much if you say "don't directly execute the 
process parameter" because the very purpose of the process parameter is 
to tell the sender exactly what to type, and to allow the sender to 
specify arbitrary preprocessing programs and arbitrary parameters to 
those programs.  Maybe recipients should know better than type 
arbitrary commands specified by the sender, but there's ample 
experience to suggest that most recipients don't have that kind of 
expertise.

Nor am I persuaded by the "sender must specify the order of 
preprocessing" argument, as groff has options to specify which of 
several preprocessors to use, and it seems to work fine.  The Version 7 
UNIX versions of these tools (circa 1979) described in the "CSTR 49" 
document you cite had several limitations (many related to having to 
run in 64k of code space on a pdp 11/70), that don't apply to modern 
troff implementations.

-Keith


On Apr 14, 2005, at 9:45 AM, Bruce Lilly wrote:

>> From: Keith Moore [mailto:moore@cs.utk.edu]
>> Sent: Tuesday, April 12, 2005 4:14 PM
>> To: iesg@ietf.org
>> Cc: moore@cs.utk.edu
>> Subject: Re: Last Call: 'Media subtype registration for media type
>> text/troff' to Informational RFC
>
> Keith,
>
> Thanks for taking the time to review and comment on the draft.
>
> Before addressing your comments in detail, I would like to make some
> observations about troff source, text formatting, and MIME mechanisms:
>
> o some documents might be text only, others might include equations,
>   tables, graphs, line drawings, etc.  Complex documents might include
>   equations within tables (CSTR 49), equations within line drawings
>   (CSTR 114), and/or equations within graphs (CSTR 116), to name but a
>   few possible combinations.  Preprocessor order is dependent on such
>   document structure (as noted in CSTR 49); it is insufficient to know
>   which preprocessors to use, the order is critically important.
>
> o many preprocessors take command line arguments; one needs to know
>   not only which preprocessors to use, and in which order, but also
>   which command line arguments to supply to each preprocessor, and
>   in some cases the order of arguments matters.
>
> o troff provides a comment mechanism (as do most preprocessors); that
>   could be used (as noted in the draft) to convey human-readable
>   document processing instructions.
>
> o MIME provides for packaging related content (multipart/related
>   composite media type), so separate processing instructions (e.g.
>   in a Makefile) could be packaged with document source.
>
>> I recommend against publication of this document as an RFC,
>> unless and until it is revised to fix the following problem:
>>
>> section 4:
>>
>> the process parameter appears to be a security hole, as it would
>> allow the sender to specify commands to be executed on the recipient's
>> system.  while this is documented in the security considerations 
>> section,
>> it is unnecessary, and long experience with implementations that 
>> provide
>> similar capability (e.g. the ability to launch an arbitrary 
>> application
>> based on the file name suffix) indicates that significant harm can 
>> come
>> from allowing the sender of a message to specify actions to be taken
>> by the recipient's MUA.
>
> The intent of the "process" parameter is to provide a uniform mechanism
> (vs. ad-hoc source comments or packaging of separate instructions) for
> communicating formatting instructions from the human author to a human
> recipient.  Automated execution is strongly recommended against (I 
> could
> of course make that a "MUST NOT", but I am skeptical of the value of 
> RFC
> 2119 language in an Informational RFC and I am uncertain of the
> implications for non-MIME contexts (N.B. the registration procedure is
> being separated from MIME)).  Publication as a Standards Track RFC 
> rather
> than Informational might address the RFC 2119 language issue (I'm
> uncertain of what the procedural implications would be at this point).
>
>> furthermore the process parameter is unlikely to be used by the 
>> recipient
>> _unless_ its handling is automated, because most MUAs do not make it 
>> easy
>> to see the content-type parameters associated with an attachment.  so
>> the parameter as defined is more likely to do harm than good, and
>> it would set a bad precedent for other content-type definitions.
>
> Handling should rarely, if ever, be fully automatic.  In this specific
> case, handling could involve presentation of the recommended processing
> command line to a user for review and modification, with execution of
> the possibly-modified command line only upon user approval.  That's
> what I had in mind.  I could add text to that effect as an example of
> appropriate use of the parameter, but I am wary of delving too far into
> implementation details, particularly as this is a media type 
> registration
> and not a specification for particular types of human interfaces.
>
>> if anything I would make the process parameter a list of keywords:
>> pic, eqn, tbl, etc.  the keywords should not be treated as commands
>> by the recipient's system (and certainly should not be looked up in 
>> the
>> recipient's PATH), but rather as flags to be passed to a processor 
>> such
>> as groff that knows where to find these programs and in what order 
>> they
>> should be run.
>
> A mere list doesn't indicate ordering.  Keywords alone don't provide
> indication of (preprocessor/formatter) arguments.  Groff (specifically
> "grog"):
> a) isn't universally available
> b) guesses wrong. Example on the source for the draft under discussion:
>      marty:/data/information/RFCs/text-troff # grog dr*2.df
>      groff -p -ms draft-lilly-text-troff-02.df
>    That indicates pic but not dformat, and indicates the ms macro
>    package, which is incorrect.
> The author presumably knows the necessary processing steps; an expert
> recipient might be able to figure out processing steps. A standard
> mechanism for conveying knowledge from the author to recipient avoids
> trial-and-error guesswork, and that's the intent of the optional
> "process" parameter.  Because the parameter is optional, a sender
> always has the option of omitting it.
>
>> there is also a need to specify what macro package is used: e.g.
>> -ms, -me, etc.
>
> I chose command-line syntax for the process parameter as it is capable
> of concisely and unambiguously indicating the necessary information:
>
> o preprocessors, formatters, and postprocessors required
>
> o preprocessor, formatter, and postprocessor arguments
>
> o order of preprocessor invocation
>
>> actually the varied handling of troff (and TeX) documents is probably
>> the reason we never defined a content-type for either of these.
>
> Yes, the richness of the type is unique; I know of no other media types
> that are as flexible, and that poses some challenges for specification.
>
>> we
>> certainly considered doing so at the time the MIME documents were 
>> being
>> written, and we used these file formats as examples of content which,
>> when interpreted on a recipient's system, could cause security 
>> breaches.
>
> Almost all media types (including plain text, as noted in the draft 
> w.r.t.
> control characters) have security issues.  I think the issues with 
> troff
> source have been reasonably addressed in the draft; compare for example
> to the approved registration for type application/msword.
>
>> Keith
>
> Would clarification of the recommendation against automated processing
> and an example of appropriate handling of the "process" parameter
> (either with or w/o "MUST NOT") address your objections?
>
> Best regards,
>   Bruce Lilly