Re: [ftpext] draft-ietf-ftpext2-hash - partial hashes

Anthony Bryan <anthonybryan@gmail.com> Mon, 17 January 2011 08:56 UTC

Return-Path: <anthonybryan@gmail.com>
X-Original-To: ftpext@core3.amsl.com
Delivered-To: ftpext@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 585393A6EC8 for <ftpext@core3.amsl.com>; Mon, 17 Jan 2011 00:56:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.281
X-Spam-Level:
X-Spam-Status: No, score=-3.281 tagged_above=-999 required=5 tests=[AWL=0.318, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M+oK5xeyBBD6 for <ftpext@core3.amsl.com>; Mon, 17 Jan 2011 00:56:50 -0800 (PST)
Received: from mail-ey0-f172.google.com (mail-ey0-f172.google.com [209.85.215.172]) by core3.amsl.com (Postfix) with ESMTP id 4B1BC3A6EC5 for <ftpext@ietf.org>; Mon, 17 Jan 2011 00:56:50 -0800 (PST)
Received: by eyd10 with SMTP id 10so2522478eyd.31 for <ftpext@ietf.org>; Mon, 17 Jan 2011 00:59:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=CcJunfN3C8XDzoiYEKApnq7gpmw7Gb5pAEtmoRgj1Wc=; b=BlsVJs3b0hhnWfZ/RH/oXoO5e0R2UX8zbKU/sKqb4w0s53BkawD1AOrtFVhieIwplK 3+J2OaxPaXz9i90oC1qaXP0IdQL48angJ3mrxk+ZpKcmeqrys0qhqwZgRzoBL+QTbewu cHTMcnVauvCH4ixrmz5bnZVfS47deOuRDzT7A=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=n8r3kbpuqYz/3U7l23M55BZwGAgE70WsLPBP+TFO1jLGd6iNdeYSwpF6VWIBl6zvYj YIQhYvRwwM1LUvpJw8qYnlPgN+vw9dn/Q4DQD8iBqigcsy8jchBpAJryotgXtl1B863W kbVjssJgaw/LZfdTnvkL40dRdXfVFANlsho0w=
MIME-Version: 1.0
Received: by 10.213.23.12 with SMTP id p12mr3419859ebb.50.1295254763341; Mon, 17 Jan 2011 00:59:23 -0800 (PST)
Received: by 10.213.9.208 with HTTP; Mon, 17 Jan 2011 00:59:23 -0800 (PST)
In-Reply-To: <F15941D3C8A2D54D92B341C20CACDF2311976FEB98@exchange>
References: <F15941D3C8A2D54D92B341C20CACDF2311976FEB98@exchange>
Date: Mon, 17 Jan 2011 03:59:23 -0500
Message-ID: <AANLkTiktAfQuq_utOMXWS11zWiU6B=vzRPM3o7X_Sx9g@mail.gmail.com>
From: Anthony Bryan <anthonybryan@gmail.com>
To: Robert Oslin <rto@globalscape.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "ftpext@ietf.org" <ftpext@ietf.org>
Subject: Re: [ftpext] draft-ietf-ftpext2-hash - partial hashes
X-BeenThere: ftpext@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <ftpext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ftpext>, <mailto:ftpext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ftpext>
List-Post: <mailto:ftpext@ietf.org>
List-Help: <mailto:ftpext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ftpext>, <mailto:ftpext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jan 2011 08:56:52 -0000

On Fri, Jan 14, 2011 at 6:32 PM, Robert Oslin <rto@globalscape.com> wrote:
> draft-ietf-ftpext2-hash open issues indicates: "Current version of the draft defines full file hashes, but not partial file hashes."
>
> Partial hashes are valuable and applicable to re-world scenarios and should be considered for inclusion in the HASH specification.
>
> Use Case:
>
>        User initiates download of multi-gigabyte file, such as ISO or similar.
>        Transaction is interrupted
>        [Later] User re-initiates download of same (supposedly) file
>
> At this point the client software must make a decision as to whether to resume the transfer or not. Filename is the first criteria observed, followed by size. However, these do not take into consideration a corrupted partial file (even with identical byte count), or that the remote file could have changed, especially given the difficulty in assessing time differences between client and host systems.
>
> By requesting the hash for the portion of the remote file matching the bytes for the partial local file, the client could determine whether the local file is indeed valid and partial and subsequently resume the transfer from the appropriate byte offset.
>
> A similar need occurs when the client has no prior knowledge of the transfer (no queue or cache mechanism) and a same name file is identified and the client must determine whether the local file is just a segment/part of the larger file located on the remote.
>
> Below is an example of overwrite logic performed today by our FTP client using hashes and size comparisons:
>
> If a user requests to download a file and a file with the same name exists locally, the client will determine if the file sizes are the same or if the destination (local) size is smaller (indicating a possible partially transferred file). If file sizes are the same then the client will compute the hash for the entire file and ask the server for to provide the hash for the corresponding remote file. The client will then skip the transfer if the hashes are identical or overwrite the file if the hashes do not match. If  the remote (source) file is larger the client will ask the server for a partial hash, up to the bytes that match the local (destination) file size. If the partial hash matches then the client will resume the transfer from the byte offset. If the hashes are different then the client will overwrite the file. (e.g. local partial file was corrupted or is not same file).

Robert, thanks for joining us & for posting.

a very quick introduction for Robert would mention that he works on
CuteFTP and is the originator of the XCRC command.

I think everyone's been unanimous in that we want HASH to support
partial file hashes.

I think there are a few things to iron out.

1) how to (optionally) select the byte range to be hashed.

we proposed a new byte RANGe selection command,
http://tools.ietf.org/html/draft-bryan-ftp-range , which needed to be
fleshed out of course but would look something like this:

   C> RANG 802816 1000000
   S> 350 Byte range starting at 802816, ending at 1000000.


2) how to show that partial hashes are supported, or if that's even
needed? add a "p" to the FEAT? or just use an error code if someone
tries to do a partial file hash and & it's not allowed or unsupported
on the server?

      C> FEAT
      S> 211-Extensions supported:
      S>  ...
      S>  HASH SHA-256p;SHA-512p;SHA-1p*;MD5p
      S>  ...
      S> 211 END

3) the server response which shows it's a partial file hash and not a
full file hash. it would probably be good to have the range in there,
and it could be mandatory, where if it was a full file hash it would
list the start & end of the file


   C> HASH filename.ext
   S> 213 SHA-256 f0ad929cd259957e160ea442eb80986b5f... filename.ext
-802816 1000000

from Lothar:

S> 226 SHA-256 f0ad929cd259957e160ea442eb80986b5f... filename.ext\
 802816 1000000 ASCII transfer complete

from Sob:

Rather than inventing new custom reply format, wouldn't it be better
to adopt MLSx style? It's simple, readable, extensible, ...

E.g.:

  S> 213 Hash.SHA-256=f0ad929cd...;Range=802816-1000000; filename.ext

(I intend to reply to the other hash messages backlog shortly)
-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads