Re: [ftpext] COMB command IETF draft proposal

John C Klensin <john-ietf@jck.com> Sat, 18 June 2011 02:31 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ftpext@ietfa.amsl.com
Delivered-To: ftpext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 16B2111E81C7 for <ftpext@ietfa.amsl.com>; Fri, 17 Jun 2011 19:31:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.329
X-Spam-Level:
X-Spam-Status: No, score=-102.329 tagged_above=-999 required=5 tests=[AWL=-0.230, BAYES_00=-2.599, GB_ABOUTYOU=0.5, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Bt7TUTLfQbW5 for <ftpext@ietfa.amsl.com>; Fri, 17 Jun 2011 19:31:25 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by ietfa.amsl.com (Postfix) with ESMTP id 72B6811E80F8 for <ftpext@ietf.org>; Fri, 17 Jun 2011 19:31:25 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1QXlJk-000M7T-Bg; Fri, 17 Jun 2011 22:31:24 -0400
Date: Fri, 17 Jun 2011 22:31:23 -0400
From: John C Klensin <john-ietf@jck.com>
To: Robert Oslin <rto@globalscape.com>, ftpext@ietf.org
Message-ID: <F536FC57F84BFB9DE3025E7A@PST.JCK.COM>
In-Reply-To: <F15941D3C8A2D54D92B341C20CACDF2311AC408AB1@exchange>
References: <F15941D3C8A2D54D92B341C20CACDF2311AC408AB1@exchange>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Re: [ftpext] COMB command IETF draft proposal
X-BeenThere: ftpext@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ftpext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ftpext>, <mailto:ftpext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ftpext>
List-Post: <mailto:ftpext@ietf.org>
List-Help: <mailto:ftpext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ftpext>, <mailto:ftpext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Jun 2011 02:31:27 -0000

--On Wednesday, June 08, 2011 11:14 -0500 Robert Oslin
<rto@globalscape.com> wrote:

>...
> Essentially an early draft to ratify the commonly used COMB
> command for support of multi-part (a.k.a accelerated) uploads.
> I welcome comments/questions from the community. Ideally
> redlining the document (or via comments in the doc). I will
> resubmit using formal IETF draft (in ASCII) once I've
> addressed all comments/questions in the MS Word version (hope
> that's ok and doesn't break and IETF rules).

As at least one other person has said, it doesn't break any
rules, but ASCII is a lot easier for many of us to deal with and
comment on, if only out of habit.

To put what I'm about to say in context, I usually take the
position of an FTP purist whose relationship with the protocol
goes all the way back to some of the original design meetings.
That makes me inclined to respond to a lot of proposals with
"that isn't FTP but an attempt to graft another protocol onto
it; why don't you use something else".   That doesn't quite
apply in this case, but...

A few comments about your spec that I don't think others have
made:


(1) It is seriously unclear and needs work to describe what you
intend.  For example, if you say "recombine a file from its
constituent parts" or "instruct the server to recombine the
previously uploaded file parts", it could be read as requiring

  STOR /f.p1
  STOR /f.p2
  STOR /f.p3
  STOR /f.pr
  COMB "/file.dat" "/f.p1" "/f.p2" "/f.p3" "/f.p4"

I think you intend that COMB cause the transfer and then the
combining function, but that is not what the document appears to
say.  Or, given

	"It is up to the user-FTP process to determine when, if,
	and how to split up a file into two or more parts,
	upload each part as a unique file over multiple parallel
	connections, retransmit one or more..."

maybe the above is exactly what you do intend (one could
substitute STOU, which would have some nice advantages, but, if
that were your intention, it is wholly missing from the spec.


(2) You need to be extremely careful with concepts like "file
path" and "directory path".  They are not defined in FTP the way
your description of COMB seems to assume.



(3) Your use of quotes is un-FTP-ish and your command makes
assumptions about the syntax of a name that is not local to the
working directory on the server-FTP.  I'd rather you didn't do
it at all, but, if you must, the specification has to identify
how you would express a file name on the server-FTP system that
was required to actually be, e.g., 
   file/".dat
Your U**x-ish assumptions would probably call for 
  COMB "/file\/\".dat" ...
or
  COMB "/file\/"".dat" ...
but the spec isn't clear and both are error-prone.

Note that the second is seriously pathological given your
"spaces not needed" rule.


(4) Remembering that we originally intended FTP to be
asynchronous between the command and data streams, it may be too
late now, but the FTP-ish way to do this probably would have
been something like:

  CWD wherever-you-want-this-to-end-up
  BOMPT   (Begin-Odd-Multipart-Transfer)
  PASV  (or PORT, here and below)
  STRU local-name
  PASV
  STRU local-name
  PASV
  STRU local-name
  ...
  COMBine remote-name

The STRU commands would presumably all get 2yz replies
specifying the name used or failure codes permitting something
else to be done.  The server would be required to return those
STRU replies in order even if the transfers finished out of
order.

Note that not only specifies completely separate and parallel
data connections (rather than maybe trying to share a single TCP
data stream), but, by using initiation and competition commands
and letting the server keep track of its own part names,  it
completely eliminates the quotes, hoping the path syntax on the
local or remote systems matches whatever you've assumed, and, by
using STRU, eliminates both the need to be sure that the part
names chosen on the sender machine are available on the receiver
one and a small potential security problem.  It also permits use
of REST, etc., on the individual streams if needed.

The use of STRU in that context would also permit an intelligent
FTP-server to use some type of pool (or scratch or reserved
temporary) space for the intermediate files, copying only the
final, reassembled, file into the target directory.  That would
eliminate several of the operational or security risks you
discuss.  Arranging to disassemble the file into pool space on
the client machine (or using the interleaved block approach
discussed below) would eliminate a number of others.

Of course, in a fully asynchronous environment, a sequence of
APPE commands would give a very close approximation of the
above.  And that is how what you are looking for now was
actually done for a while, long ago.

FWIW, the above model could work rather well in the download
direction, elegantly eliminating the need for odd tricks with
the checkpoint/restart mechanism.

All of that said, experience with a number of
distributed/parallel storage systems, including some distributed
database update models and RAID striping, would suggest that the
most transfer-efficient way to do what you are trying to
accomplish would be to use a completely different model:  treat
the file to be transferred as a sequence of blocks of specified
size, open a bunch of connections, push block 1 onto connection
0, block 2 onto connection 1, and continue with each block, M,
going out onto connection (M modulo number of connections).
That would not only permit more optimization but would permit
reassembly to start while the transfer was still in progress.



(5) Some of the material in 3.3 don't make any sense.  If the
server "SHOULD" or "MAY" send the specified responses, it seems
to me that is obligatory on you to specify what happens if the
server chooses to do something else.  Presumably it is not open
season for anything you don't specify -- I'd predict severe
interoperability problems if a server responded to your 550 case
by returning, e.g., 987.


(6) The first two paragraphs of your Security Considerations
section are a little bogus.  Providing COMB support only to
authorized users, etc., may protect against certain classes of
malicious attacks, but they provide absolutely no protection
against ignorance, stupidity, or carelessness, any of which can
be easily exploited by an attacker as well as creating risks of
equally-damaging accidents.

best,
   john