Re: Gen-ART LC Review of draft-bryan-metalinkhttp-19

Anthony Bryan <anthonybryan@gmail.com> Mon, 14 February 2011 22:43 UTC

Return-Path: <anthonybryan@gmail.com>
X-Original-To: ietf@core3.amsl.com
Delivered-To: ietf@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 240753A6C72; Mon, 14 Feb 2011 14:43:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.149
X-Spam-Level:
X-Spam-Status: No, score=-2.149 tagged_above=-999 required=5 tests=[AWL=-1.450, BAYES_00=-2.599, J_CHICKENPOX_39=0.6, MANGLED_LIST=2.3, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HA6DOmnmcksG; Mon, 14 Feb 2011 14:43:14 -0800 (PST)
Received: from mail-ey0-f172.google.com (mail-ey0-f172.google.com [209.85.215.172]) by core3.amsl.com (Postfix) with ESMTP id 911F73A6A3D; Mon, 14 Feb 2011 14:43:13 -0800 (PST)
Received: by eyd10 with SMTP id 10so2871053eyd.31 for <multiple recipients>; Mon, 14 Feb 2011 14:43:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=OAEuQ+DkxKpnhgu0ANzChm5qETRbFCjBuFIIdGi8Lyk=; b=H7gt6BpUfpKnQRXbEGLPCtwWp0DRbEZ0gJlyG6BYgRLusLlLCaui2r1Yp0Omsi9TKu BDNou2TlGkvf3HBw7Fd3jApYe4H+KKmkEkLA86O6JfWR0ZLNJvEn3o1+ZP88rXXq/ZAi RDZ9eIOWC0y3B2oFlPCy7a1au6eezrhFZI5p0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Cy+eUkWdYIkrCFInDh5dB5MRQyU0pGZ1r2ULGQVxecFnvteGmo2L4Cj5MoLaYbUrcS cgrNOymkqcs+mWAanzEBqktyauoaIZB/xaTLJLpsFmcc8uZweXnHwYS7ENiP/eCSjMZU SBhJYex/STWQvbOBfDkAGOjFcqLODrfdAqcyc=
MIME-Version: 1.0
Received: by 10.213.29.210 with SMTP id r18mr4627184ebc.62.1297723414743; Mon, 14 Feb 2011 14:43:34 -0800 (PST)
Received: by 10.213.22.11 with HTTP; Mon, 14 Feb 2011 14:43:34 -0800 (PST)
In-Reply-To: <19DDC0DF-9ECF-499A-954E-17BABF6410B5@nostrum.com>
References: <19DDC0DF-9ECF-499A-954E-17BABF6410B5@nostrum.com>
Date: Mon, 14 Feb 2011 17:43:34 -0500
Message-ID: <AANLkTik3VNfsT4fY2wc8HpfidPVCshGSUW1py4Jzf==w@mail.gmail.com>
Subject: Re: Gen-ART LC Review of draft-bryan-metalinkhttp-19
From: Anthony Bryan <anthonybryan@gmail.com>
To: Ben Campbell <ben@nostrum.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Tue, 15 Feb 2011 09:25:52 -0800
Cc: alexey.melnikov@isode.com, General Area Review Team <gen-art@ietf.org>, The IETF <ietf@ietf.org>, draft-bryan-metalinkhttp.all@tools.ietf.org
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Feb 2011 22:43:16 -0000

hi Ben, thanks for taking the time to read & critique this :)
I apologize for it needing this exhaustive review.

the new version can be found at
http://tools.ietf.org/html/draft-bryan-metalinkhttp-20

with diff including your & others suggestions
http://tools.ietf.org/rfcdiff?url2=draft-bryan-metalinkhttp-20.txt

was there supposed to be a comment along with

-- section 2, 4th paragraph: "HTTP mirror servers SHOULD share the
same ETag policy as the originating Metalink server."
?

On Fri, Feb 11, 2011 at 6:46 PM, Ben Campbell <ben@nostrum.com> wrote:
> I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, please see the FAQ at <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>
> Please resolve these comments along with any other Last Call comments you may receive.
>
> Document: draft-bryan-metalinkhttp-19
> Reviewer: Ben Campbell
> Review Date: 2011-02-11
> IETF LC End Date: 2011-02-11
>
> Summary:
>
> This document is on the right track for a proposed standard, but there are open issues and editorial issues that should be addressed prior to publication.
>
> --- Major issues:
>
> none.
>
> --- Minor issues:
>
> -- Section 1, paragraph 1, first sentence:
>
> Is this really intended as an alternative to Metalink/XML? It seems like more of a complementary approach than an alternative one.

well...you could use either, or both :)

"Metalink/HTTP is an alternative and complementary representation of
Metalink information,"...
?

> -- section 2, third paragraph:
>
> If they must have the same eTag policy, should this document not specify at least one mandatory-to-implement policy?

SHOULD share the same ETag policy.

I added:

"It is up to the administrator of the Metalink server to communicate
the details of the shared ETag policy to the administrators of the
mirror servers so that the mirror servers can be configured with the
same ETag policy."

> -- section 2, last paragraph: "Metalink clients MAY make use of digital signatures if they are offered."
>
> That seems too weak. We usually recommend that people check signatures on downloads when available. Wouldn't we want an automated system to treat this as at least a SHOULD?

I think this was written with the idea that most download clients
won't have the capability to check signatures, even if we would like
them to.

"Metalink clients SHOULD support checking digital signatures."

> -- section 3, 4th paragraph: "[[Some organizations have many mirrors. Only send a few mirrors, or only use the Link header fields if Want-Digest is used?]]"
>
> Open issue?

removed...

> -- section 3.3, 1st paragraph:"([draft-ietf-ftpext2-hash] allows for FTP mirrors to be coordinated and provide file hashes)."
>
> Do you mean to allow such FTP servers to be preferred?

removed too.

> -- section 4.1 "This is particularly useful for providing metadata such as cryptographic hashes of parts of a file, allowing a client to recover from partial errors (see Section 7.1.2)."
>
> Seems like this might warrant a SHOULD.

"Metalink servers SHOULD provide Metalink/XML files with partial file
hashes in Link header fields and Metalink clients SHOULD use them for
error recovery."

> -- section 5, last paragraph: "Metalink clients MAY support the use of OpenPGP signatures."
>
> MAY seems weak here. This means that the publisher of a file has no reasonable expectation that clients will check the signature.

changed to SHOULD.

> -- section 6, 3rd paragraph: "Digest: SHA-256=..."
>
> Example needs more context

added:
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: &lt;http://www2.example.com/example.ext&gt;; rel=duplicate


> -- section 7, 4th paragraph from end: "Downloads from mirrors that do not have the same file size as the Metalink server are considered unusable and the client can deal with it as it sees fit."
>
> What if the client sees fit to use it? Is there a normative  "client SHOULD NOT" implied here?

changed to

"Metalink clients SHOULD NOT use downloads from mirrors that do not
have the same file size as the Metalink server. "

> -- section 7.1.1, 1st paragraph:"the requirement is that merging of ranges from multiple responses must be verified"
>
> The requirement for what? Is this intended to be normative?

changed

> -- section 7.1.2, 2nd paragraph:"If no partial cryptographic hashes are available, then the client MUST fetch the complete object from other mirrors."
>
> Does this imply that the client also MUST fetch from other mirrors if the hashed are present but fail verification?

no, changed.

> -- section 8, last paragraph: "The size of chunks chosen by the client should be sufficiently large that the chunk request header fields and reponse header fields represent negligible overhead, and sufficiently large that they can be pipelined effectively without needing a very high rate of chunk requests."
>
> Can you provide guidance on what "sufficiently large", and  "negligible overhead" mean, either in terms of examples or real numbers?
>
> "Note that Range requests impose an overhead on servers and clients need to be aware of that and not abuse them."
>
> What constitutes abuse? Can you provide more specific guidance on when range should not be used, since all the other guidance seems to suggest using it?

removed section 8 except for this first sentence & added the next

"Note that Range requests impose an overhead on servers and clients
need to be aware of that and not abuse them. Metalink clients SHOULD
NOT make more than one concurrent Range request to each mirror server
that it downloads from."

> -- section 10.2:
>
> What can an implementor do to mitigate spoofing?

nothing, I'm aware of?

added:
As with all downloads, users should only download from trusted sources.

> -- section 10.4:
>
> This section should include a warning that clients only have a MAY level requirement to support signature verification.

changed to SHOULD

> "Metalinks should include digital signatures, as described in Section 5."
>
> Normative?

yes, changed to SHOULD.

> -- section 11.1, [BITTORRENT]
>
> Is bittorrent.org an acceptable SDO for the purposes of normative references? (I'm not saying it's not--just asking)

it was for RFC 5854.
http://tools.ietf.org/html/rfc5854#section-8.1

> [draft-ietf-ftpext2-hash]  -- is this a normative downref?

removed



> --- Nits/editorial comments:
>
> -- section 2, 4th paragraph: "HTTP mirror servers SHOULD share the same ETag policy as the originating Metalink server."

what's wrong here?

> -- section 2, 5th paragraph: "Metalink clients use the mirrors provided by a Metalink server with Link header fields [RFC5988]."
>
> Sentence seems ambiguous. Do you mean "_in_ Link header fields."

yes, changed.

> -- section 3, 2nd paragraph: "A brief Metalink server response with two mirrors only:"
>
> Sentence fragment

fixed

> -- section 3, last paragraph: "fieldss"
>
> Typo. (Note same typo repeats throughout)

fixed

> "Such a decision could be a hard-coded limit, a random selection, based on file size, or based on server load."
>
> I assume this is not meant to be exhaustive?

"As some organizations can have many mirrors, it is up to the
organization to configure the amount of Link header fields the
Metalink server will provide. Such a decision could be a random
selection or a hard-coded limit based on network proximity, file size,
server load, or other factors. "

> -- section 3.3, last paragraph:
>
> Paragraph is redundant with similar normative statements in section 2

removed from 3.3

> -- section 4.1: "as shown in Section 4."
>
> .. as shown in the example in section 4.

fixed

> -- section 6, 1st paragraph: "Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for files they describe with mirrors via Link header fields."
>
> Redundant with previously mentioned normative requirement.

removed

> -- section 7, paragraph starting with: "If no range limit was given in the original request then work from the tail of the object (the first request is still running and will eventually catch up), otherwise continue after the range requested in the first request."
>
> Implied "you". Restate as "the client works from the tail" (Please avoid use of 2nd person pronouns in an RFC, implied or otherwise) (Note that this usage is sprinkled throughout )
>
> "If no Range was provided,"
>
> ... in the original request,...
>
> Also, be careful with the passive voice in this paragraph. Who undertakes HEAD requests first? Who undertakes Range-based requests?

rewritten

   If the first request was GET and no Range header field was sent and
   the client determines later that it will issue a Range request, then
   the client SHOULD close the first connection when it catches up with
   the other parallel ranged downloads of the same object.  This means
   the first connection was sacrificed.  Metalink clients can use a HEAD
   request first, if possible, so that the client can find out if there
   are any Link header fields, and then Range-based requests are
   undertaken to the mirror servers without sacrificing a first
   connection.

> -- section 7, paragraph 10: "If-Match conditions based on the ETag SHOULD be used..."
>
> Clients should use If-Match conditions...
>
> "If no indication of ETag syncronisation/ knowledge is given then If-Match should not be used, and optimally there will be an Instance Digest in the mirror response which we can use to detect a mismatch early, and if not then a mismatch won’t be detected until the completed object is verified."
>
> Is there any way other than the pref parameter to indicate this? If not, it would be better to say "if no pref parameter is present..."
>
> Is this sentence intended to be normative?
>
> Also, "... The client should not use..."
>
> Finally, the sentence is hard to follow--please break it up.

changed:

   Preferred mirrors have coordinated ETags, as described in
   Section 3.3, and Metalink clients SHOULD use If-Match conditions
   based on the ETag to quickly detect out-of-date mirrors by using the
   ETag from the Metalink server response.  Optimally, the mirror server
   will include an Instance Digest in the mirror response to the client
   GET request, which the client can also use to detect a mismatch
   early.  If the mirror did not include the pref parameter or an
   Instance Digest, then a mismatch can not be detected until the
   completed object is verified.

> -- section 7.1.1, general:
>
> This whole section switches to an almost stream-of-consciousness style. It has very long sentences with many comma separated clauses that are hard to follow. It really needs to be rewritten to be more readable and precise.

hahaha :)

this is my fault as a bad editor & a non-native English speaker co-author.

> -- section 7.1.1, paragraph 3:
>
> Paragraph seems self contradictory. How is guaranteeing correct content different than verifying integrity?

the server sends the correct content  but there can be errors during transfer.

maybe this is not explicit enough. might need to condense some of this down.

earlier we state:
Error detection requires Instance Digests to detect errors in transfer
after the transfers have completed.

> -- paragraph 4: "This guarantees that a mismatch will be detected by using only the synchronized ETag from a master server and mirror server, even alerted by the mirror servers themselves by responding with an error, preventing accidental merges of ranges from different versions of files with the same name."
>
> I can't parse this sentence.

 This guarantees that a mismatch will be detected by using only the
   shared ETag from a Metalink server and mirror server.  Mirror servers
   will respond with an error if ETags do not match, which will prevent
   accidental merges of ranges from different versions of files with the
   same name.

> "This even includes many malicious attacks where the data on the mirror has been replaced by some other file, but not all."
>
> Antecedent of "this" is ambiguous

removed

> -- 2nd to last paragraph:
>
> This is the first mention of e idea of slave and master mirror servers. Is this the same as preferred and normal mirrors? Please use consistent terms.

ok.

> -- section 7.1.2, 2nd paragraph: "If the object cryptographic hash does not match the Instance Digest then fetch the Metalink/XML if available, where partial file cryptographic hashes can be found, allowing detection of which server returned incorrect data."
>
> I can't parse this sentence.

   If the cryptographic hash of the object does not match the Instance
   Digest from the Metalink server, then the client SHOULD fetch the
   Metalink/XML (if available) that could contain partial file
   cryptographic hashes which will allow detection of which mirror
   server returned incorrect data.  Metalink clients SHOULD figure out
   what ranges of the downloaded data can be recovered and what needs to
   be fetched again.

> -- section 8, paragraph after bullet list:
>
> Please avoid first person pronouns in an RFC.
>
> "Based on laboratory experiments,..."
>
> Reference?

removed text

> "we suggest a good default number of simultaneous connections is probably four,"
>
> That's very weak language--can you remove "we suggest" and "probably"?

removed

> -- Section 9, 1st paragraph: "registration to the Link Relation Type registry"
>
> Can you provide the URL to the registry?

added URL to registry

> -- section 10.2 "In that case, this could deceive unaware downloaders that they are downloading a malicious or worthless file."
>
> Sentence does not make sense. Does the attacker with to deceive them into downloading a malicious file, or convince them that they are when they are not?

"In that case, this could deceive unaware downloaders into downloading
a malicious or worthless file."


-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads