Re: Last Call comments on draft-strombergson-shf-04.txt

For the most part I second jhutz's opinion on this draft.
Additional comments inline...

Jeffrey Hutzelman wrote:

> My first overall thought is "why??"
> Not everything needs to be wrapped in XML, and in this case, it appears
> that there are few real benefits and a number of significant drawbacks.
> 
> It is difficult to tell from the document whether the authors actually
> intend this format as a substitute for programming-data formats like
> S-records, or as a format for transferring data dumps over the Internet.
> It has a number of drawbacks which would seem to make it unsuitable for
> the former case, and doesn't seem to offer much over a raw hex dump
> for the latter.  It would be helpful if the authors could clarify the
> intended application for this format.

It appears to me that this format is primarily useful as a storage 
format.  The authors state that the format is not secure enough by 
itself to ensure integrity when transferring between systems.  In a 
transport format there should be some means of ensuring that all blocks 
have been received in the correct order and that no additional blocks
were added.  This description does not attempt to provide those features
therefore I believe it is meant for storage.  This should be stated
clearly.

Even for a storage format it would be nice for the authors to specify
a means for describing block order as well as a means for providing
a checksum over the entire dump.

> I would also advise the authors and other interested parties to examine
> draft-housley-cms-fw-wrap-09.txt, on which an IETF Last Call recently
> concluded.  It describes a method for securely transporting firmware
> images over the internet and directly to hardware devices.  While it is
> too complex to be suitable for direct programming of low-level devices,
> it is quite appropriate for delivery as far as the workstation, device
> programmer, or bootloader.  [Note that I have nothing to do with that
> document, other than having recently reviewed it]
> 
> 
> In any case, I see a number of problems, some of which are significant;
> 
> - This specification repeatedly uses the word "byte" to refer to an octet.
>  Further, it prohibits representation of data with word sizes which are
>  not multiples of 8 bits, claiming that such things are not used in
>  "practical present-day applications".  While byte sizes other than 8 bits
>  and word sizes which are not multiples of 8 bits have become extremely
>  uncommon in general-purpose computing devices, they are still used in
>  more special-purpose devices, and many of the low-level devices which
>  are within the stated scope of this document are programmed with data
>  which uses "odd" word sizes.

The document provides a definition for "byte" and then defines "octet" 
as a "byte" but doesn't use it after that.  I would replace all 
references to "byte" with "octet" and get rid of "byte" entirely.

The restriction to "word" sizes which are multiples of "octets" seems a
bit odd to me as well especially given the restriction on the size of a
block being specified in "bits".

> - The introduction indicates an intent to provide an alternative to
>  formats used for "hexadecimal data" and particularly device programming
>  data, the de facto standard "S-record" format is mentioned by name.
>  However, it fails to capture a fundamental property of such formats,
>  which is that they are generally simple enough to send to a device or
>  programmer without further parsing.  The authors admit that an XML
>  parser is "not easily deployed in hardware devices", but suggest that
>  instead a workstation should be used to convert data from the specified
>  format into one the device can actually handle.
> 
>  If this is the expected use case, then I fail to see the advantage over
>  simply transporting the data over established file-transfer protocols
>  (FTP, HTTP) in a format which can be directly understood by the device.
>  Many devices can be programmed by sending the distributed image over an
>  RS-232 connection with no preprocessing; requiring a translation step
>  severely reduces the set of devices that can be used for this purpose.
>  For example, it makes it unlikely that I would be able to walk around
>  my machine room with a PDA, upgrading firmware in network devices or
>  RAID controllers.

One example that I came up with would be a driver update package in 
which the dump contained different versions of the same drivers for
different platforms.  Perhaps one for a 32-bit version of the OS and
another for the 64-bit version.  Only one of which would actually be
used.  In such an example, the dump is simply a storage medium and
the processing application would be selectively extracting from it 
independent data streams for eventual delivery to the device.

Unfortunately, I feel like I am searching for a target application which
should be spelled out in the document.

> - This specification REQUIREs the use of SHA-1, providing no means to
>  upgrade to an alternate hash in the future.  This lack of algorithm
>  agility is not very forward-looking.

The checksum is currently embedded in the header for each block.  The
problem I have with this is that it restricts the size of the block to
be something storable in memory and even assumes that the entire data 
block must be available prior to the generation of the header.  It is
very likely that the source of the data being stored by be coming from
a stream source and there may not be enough memory to store it all 
before writing to the dump media.

The checksum should be stored as a tag inside the block and the tag
should contain an attribute specifying which algorithm was used.

A similar checksum tag should be available to validate the entire dump.

> - In section 4.1, you say "if the value is untrue...".  I suspect you
>  mean something like "if the value does not match...".  Further,
>  rather than leaving the behaviour in the case of an incorrect length
>  up to the implementation, it should be RECOMMENDED (RFC2119) that
>  implementations reject such files.
> 
> - In section 4.2, you require the start_address attribute to be
>  provided, even though it may not be meaningful in all cases.  This
>  attribute should be OPTIONAL.

I can see this format being used to store crash data from an application 
for later debugging.  In this case there may be blocks which contain 
stack information or register contents which are not memory addressable.

> - I don't believe 64 bits are required to represent word size.
>  In fact, I question whether it is necessary for this format to
>  represent word size at all.

I believe that word size may make sense for some types of blocks which 
would be stored in the dump file but it should not be REQUIRED.  I 
believe the most general applications would only be interested in octet 
streams.

> - The number of blocks is OPTIONAL, but the block length is REQUIRED.
>  Further, there is a per-block checksum but no overall checksum.
>  These properties would seem to suggest that the intent is to allow
>  stream-encoding by encoding an arbitrary number of relatively small
>  blocks.  This is fine, but lacking both a block count and an overall
>  checksum, there is no way to tell whether the entire dump was
>  transferred correctly.  I would suggest adding an overall-checksum
>  element, to be encoded after the last block (_not_ as an attribute).

If one purpose is to allow encoding an arbitrary number of small blocks, 
there should be some indication of whether order is important, whether 
blocks can be dropped, etc.

> - Why is the number of _bits_ in a block limited to 2^64-1?
>  This limitation seems unnecessary, given that everything else is
>  done in terms of octets.

Why bits if the word size is restricted to octets?  Why not just specify 
the number of words since words are already required?

> - The requirement that words inside a dump be represented in network
>  order is silly.  The contents of a dump are by their nature specific
>  to a particular device, and should be in whatever format is most
>  appropriate for that device.  Again, I question whether this format
>  should have any notion of "words" at all.

As one of the comments in the ID Tracker stated, the byte order 
representation for each block should be determined by the application.
Each block should have an attribute specifying the byte order used.

My biggest concern is that this format is not general enough.  I fear
that because the uses the authors were considering are not spelled out
that there are underlying assumptions embedded in the document which
will hamper its usefulness.

Jeffrey Altman
Secure Endpoints Inc.

Re: Last Call comments on draft-strombergson-shf-04.txt

Attachment: jaltman.vcf

Attachment: smime.p7s