Re: [ftpext] draft-peterson-streamlined-ftp-command-extensions

"Mark P. Peterson" <mpp@rhinosoft.com> Wed, 24 November 2010 13:41 UTC

Return-Path: <prvs=1944f83767=mpp@rhinosoft.com>
X-Original-To: ftpext@core3.amsl.com
Delivered-To: ftpext@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1766C28C204 for <ftpext@core3.amsl.com>; Wed, 24 Nov 2010 05:41:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.221
X-Spam-Level:
X-Spam-Status: No, score=-2.221 tagged_above=-999 required=5 tests=[AWL=0.378, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xnl9LEJ-1dQi for <ftpext@core3.amsl.com>; Wed, 24 Nov 2010 05:41:24 -0800 (PST)
Received: from rhinosoft.com (mail1.rhinosoft.com [97.88.242.106]) by core3.amsl.com (Postfix) with ESMTP id A9E0328C1EF for <ftpext@ietf.org>; Wed, 24 Nov 2010 05:41:23 -0800 (PST)
Received: from MPNOTEBOOK ([192.168.1.20]) (authenticated user mpp@rhinosoft.com) by rhinosoft.com (rhinosoft.com [127.0.0.1]) (MDaemon PRO v11.0.3) with ESMTP id md50009936400.msg for <ftpext@ietf.org>; Wed, 24 Nov 2010 07:42:22 -0600
X-Spam-Processed: rhinosoft.com, Wed, 24 Nov 2010 07:42:22 -0600 (not processed: spam filter heuristic analysis disabled)
X-Authenticated-Sender: mpp@rhinosoft.com
X-MDRemoteIP: 192.168.1.20
X-Return-Path: prvs=1944f83767=mpp@rhinosoft.com
X-Envelope-From: mpp@rhinosoft.com
X-MDaemon-Deliver-To: ftpext@ietf.org
Message-ID: <E02E65C7F82940BCAFDCC2EB7ED5194C@rhinooffice.net>
From: "Mark P. Peterson" <mpp@rhinosoft.com>
To: Robert McMurray <robmcm@microsoft.com>, ftpext@ietf.org
References: <A5FC996C3C37DC4DA5076F1046B5674C442CE618@TK5EX14MBXC125.redmond.corp.microsoft.com> <D2BD569FC8F4431E85F609B71E61EB79@rhinooffice.net> <A5FC996C3C37DC4DA5076F1046B5674C442E8816@TK5EX14MBXC127.redmond.corp.microsoft.com>
In-Reply-To: <A5FC996C3C37DC4DA5076F1046B5674C442E8816@TK5EX14MBXC127.redmond.corp.microsoft.com>
Date: Wed, 24 Nov 2010 07:42:16 -0600
Organization: Rhino Software, Inc.
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 14.0.8089.726
X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8089.726
Subject: Re: [ftpext] draft-peterson-streamlined-ftp-command-extensions
X-BeenThere: ftpext@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <ftpext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ftpext>, <mailto:ftpext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ftpext>
List-Post: <mailto:ftpext@ietf.org>
List-Help: <mailto:ftpext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ftpext>, <mailto:ftpext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Nov 2010 13:41:26 -0000

Hi Robert,

I'm not exactly sure how to reply to your message I will try without quoting, if that makes any sense.  I understand the problem in 
your example, however I think that the problem leads to the implementation of a good FTP server vs. one that isn't so good.  In the 
case of your server I can see a cache implementation that would solve the problem, reduce the amount of required disk space, and 
provide the client with varying sizes of images as I propose with this command.  To solve the real-world problem you've outlined I 
would:

* Add a limit in the server, either hard-coded or user configurable, to identify the maximum dimensions of a thumbnail, let's use 1k 
x 1k for an example.
* Design a thumbnail cache which identifies when an image was placed into the cache, when it was last accessed, the thumbnail 
dimensions, etc.
* When a THMB command arrives for a particular file, first identify if it's in the cache.  If it is in the cache, use the file from 
the cache and create a thumbnail image and provide it to the client.  If not in the cache, create a thumbnail of the maximum allowed 
size, add it to the cache, then return the requested image to the client.
* Maintain the cache.

This simplistic outline does require an additional transformation to get the thumbnail image, however the horsepower required to 
perform this operation is now trivial, for example taking that 1k x 1k image down to 100 x 100.  If another request comes in for 200 
x 200 for that same image, again the CPU usage is very small, yet the amount of data transferred remains low and not limited.

My point to the above example is that we are talking about the implementation.  I realize implementations cannot be ignored but in 
your example an efficient simple solution does exist while not limiting the client to a fixed thumbnail size.  The implementation I 
propose, above, could be extended to support multiple image sizes in addition to a single image size, whatever is needed by the 
server.  For example, if the server received a request for a 256 x 256 thumbnail, the server could decide to create a 512 x 512 
cache image, then create the 256 x 256 sized image to send to the client.  Instead of maintaining a cache of specific sizes it could 
use reasonable stratifications from which thumbnails can be created.

If the server decides it wants to attempt to pre-cache images it could, and it could be triggered by the first thumbnail image 
requested in a directory.  Again, I think this would be an implementation detail that would be at the server author's design.

Regarding thumbnail types.  I agree with you, thumbnails of videos, PDFs, Word documents, text files, and others are all reasonable 
uses for this command.  I don't think my document prevents different file types from being supported, I think it's quite extensible.

Mark P. Peterson - President
http://www.RhinoSoft.com
Voice: +1(262) 560-9627
FAX: +1(262) 560-9628


--------------------------------------------------
From: "Robert McMurray" <robmcm@microsoft.com>
Sent: Tuesday, November 23, 2010 10:26 PM
To: "'Mark P. Peterson'" <mpp@rhinosoft.com>; <ftpext@ietf.org>
Subject: RE: [ftpext] draft-peterson-streamlined-ftp-command-extensions

Thanks, Mark.

> Mark wrote:
>
> That's the point of the command,
> though.  It's to reduce the amount of
> data that is required to travel over
> the wire.  The idea is to shorten
> delivery time to the client, letting
> the client decide what is best for
> the client, not the server.

I understood that to be the case, but looking back to Anthony's original question - he had asked which commands were more valuable 
to me than others. With that in mind, when I considered the THMB command from a server implementation point-of-view as it appears in 
the current draft, I thought that the current proposal was a little impractical, which would make me lean away from implementing it.

Here's a real-world scenario that illustrates why I think that the current THMB proposal breaks down quickly in a normal use 
scenario:

Let's say that I have a friend who works in astrophotography, and let's say that he's doing his doctoral work on the Crab Nebula. He 
books some time at one of the national observatories and takes several series of images of the Crab Nebula that are stitched 
together into 50 large composite images. (These have very large dimensions, 100K by 100K pixels, and each is over 100MB in size, 
which are the perfect candidates for thumbnails.) When he publishes his thesis for peer review, he drops the full-sized images in a 
directory on an FTP site that I manage where his colleagues can access them. Let's say that his thesis garners a modicum of interest 
on the day of its publication for review - perhaps just 10 users. Each of these users has an FTP client that supports the THMB 
command as it is currently designed, and each FTP client can use varying thumbnail dimensions. So here's what happens:

Client 1 has the following transaction:

Client01> THMB JPG 100 100 Crab_Nebula_01.jpg
Server00> 150 Starting thumbnail transfer for Crab_Nebula_01.jpg
-         //Note: 5K bytes are transferred over the data channel//
Server00> 226 Transfer complete.

Since the thumbnail was generated dynamically based on the client's input, one of my server's CPUs spikes while it loads the 100MB 
image into RAM, renders the thumbnail in the user-requested image format and dimensions, and dumps the image on the wire. (Because a 
client can request any image dimensions and format, my server implementation would probably not cache the thumbnail in memory, but 
I'll come back to that.) Still, transferring only 5K over the wire is a lot better than transferring 100MB, so we're in agreement 
that reducing the bandwidth is a good thing.

But I have 9 other clients that are hitting my FTP site and they issue FTP thumbnail requests like the following examples:

Client02> THMB PNG 100 100 Crab_Nebula_01.jpg
Client03> THMB GIF 100 100 Crab_Nebula_01.jpg
Client04> THMB JPG 200 200 Crab_Nebula_01.jpg
Client05> THMB PNG 200 200 Crab_Nebula_01.jpg
Client06> THMB GIF 150 150 Crab_Nebula_01.jpg
Client07> THMB PNG 640 480 Crab_Nebula_01.jpg
Client08> THMB GIF 320 240 Crab_Nebula_01.jpg
Client09> THMB JPG 150 150 Crab_Nebula_01.jpg
Client10> THMB JPG 320 240 Crab_Nebula_01.jpg

So my server has received 10 completely dissimilar thumbnail requests, and all of those requests are just for the first physical 
image. If each FTP client attempts to download just 20 thumbnails in order to show the first set of thumbnails in each graphical FTP 
client, I have to dynamically generate 200 different thumbnails. Since these are all 100MB images, I am reducing my bandwidth at the 
expense of the CPU and RAM resources that are required to process the THMB requests.

I mentioned earlier that I would probably not cache the thumbnails in RAM because the client can request any thumbnail dimensions, 
so let's say that my server implementation at least caches thumbnails to disk so I don't have to generate them dynamically for every 
request. Since each FTP client can request multiple image formats and any image dimensions, I may have to keep track of hundreds of 
thumbnails for each single physical image. That will quickly start to eat up disk space, so now I have to implement some form of 
garbage collection to clean up stale thumbnails. But even then, since my friend has 50 physical images and I configure my FTP server 
implementation to only keep the 10 most-recent thumbnails around, I'm still managing 500 thumbnails for his original 50 physical 
images.

All in all I find the approach for letting only the client have full control very impractical, and that would prevent me from 
wanting to implement the THMB command as it is currently documented in the draft. But that being said, I like the idea of having a 
thumbnail command, just not the way that it's currently proposed. That's why I was suggesting some alternatives, and I'll expound a 
little on that.

In my last email I had suggested letting the server be a little more in control, and I had suggested that the server could 
optionally tell an FTP client that the client can't specify the image format. If I use the astrophotography scenario that I just 
gave, that means that I could have my friend pre-create the 50 thumbnails for the 50 physical images in some fashion where the FTP 
server implementation would pick them up. (I could use shadow folders, unique thumbnail naming, etc.) This means that there is no 
spike in CPU or RAM when the 10 FTP clients issue their THMB requests; it also means that I'm only managing 50 static thumbnails. So 
now the requests could be as simple as the following:

Client01> THMB Crab_Nebula_01.jpg
Server00> 150 Starting thumbnail transfer for Crab_Nebula_01.jpg
Server00> 226 Transfer complete.
Client02> THMB Crab_Nebula_01.jpg
Server00> 150 Starting thumbnail transfer for Crab_Nebula_01.jpg
Server00> 226 Transfer complete.
... etc ...

The reason why I had proposed using the OPTS command was to give the client the level of control that you mentioned in your reply, 
e.g. "letting the client decide what is best for the client, not the server." If you were to combine your client request concepts 
with elements of my OPTS-based suggestion, you could create a hybrid of the two approaches that might address all concerns. Here are 
some examples:

In this example, the client simply asks for the server's current thumbnail configuration:
C> OPTS THMB
S> 200 JPG 100 100

In this example, the client specifies a new thumbnail format:
C> OPTS THMB PNG
S> 200 PNG 100 100

In this example, the client specifies a new thumbnail format and dimensions:
C> OPTS THMB GIF 150 150
S> 200 GIF 150 150

I think that something more of a hybrid approach works better - the client can ask for a file format and dimensions, but the server 
can still say "no" to a client-specified file format and dimensions when it wants to, but still return a thumbnail when the client 
asks for it.

C> OPTS THMB PNG 100 100
S> 504 Specifying thumbnail properties is unsupported.

C> THMB widget.png
S> 150 JPG Starting thumbnail transfer for widget.png
S> 226 Transfer complete.

This makes it easier for the server implementation to have a pre-cached collection of thumbnails, especially when given the 
realistic scenario that I listed above. But if you omit using OPTS command and stick with using a single THMB command, when an FTP 
client sends a request that your server doesn't want to fulfill for some reason, your only recourse is to fail the whole request. 
Whereas, if break the process into separate OPTS and THMB commands, you can fulfill a THMB request even if you reject the custom 
parameters that the client had requested with an OPTS command.

> Mark wrote:
>
> This could waste server-side
> resources, the client might not even
> make a request for thumbnail images.

This is true, but using the astrophotography scenario once again, I'd rather have 50 5K thumbnail files eating up a tiny fraction of 
disk space (which is a dirt cheap resource) rather than trying to generate dynamic thumbnails for 50 100MB images and eating CPU and 
RAM (which are expensive resources).

If my server implementation didn't pre-cache thumbnails, and I configured my server to only allow JPG format thumbnails at 100x100 
pixels, it would still be possible to implement some form of in-memory or dynamic to disk caching for subsequent requests, because 
now the list of variables has been reduced. For example:

Client01> THMB Crab_Nebula_01.jpg
-        //Note: the thumbnail was not//
-        //pre-cached, so the server  //
-        //creates it dynamically and //
-        //caches it to disk after it //
-        //sends the thumbnail to the //
-        //client                     //
Client02> THMB Crab_Nebula_01.jpg
-        //Note: the thumbnail was    //
-        //cached during the previous //
-        //request, so the server can //
-        //send the thumbnail to the  //
-        //client with no additional  //
-        //processing required        //
Client03> THMB Crab_Nebula_01.jpg
... etc ...
Client10> THMB Crab_Nebula_01.jpg

> Mark wrote:
>
> This isn't really the purpose of this
> command.  The purpose is to take an
> original image file and reduce it to
> a specified size, not to return the
> OS's icon representation for that
> particular file type.

Perhaps that was not the originally-intended purpose for this command, but then I think that you're limiting the usefulness of the 
command. My suggestion to return an icon may not have been the best example, but limiting the THMB command to just image files is 
not very useful, since there are other files that would yield beneficial results. For example, video files are typically much larger 
than images, so why shouldn't video files be able to have unique thumbnails? What about thumbnails for new image types that are 
introduced later, like SVG files? I would say that it's certainly possible that any server implementation could refuse to send a 
thumbnail for any file that it chooses, but why shouldn't a server implementation be able to return a thumbnail for any file? I am 
simply suggesting that limiting the functionality of the THMB command to just images reduces the overall value of the command.

All that being said, as I stated earlier, I like the idea of a THMB command, but at the moment I'm not fond of the current proposal. 
When I consider that implementers of FTP clients might read about the THMB command in an RFC and start creating FTP clients that can 
issue requests for thumbnails in any number of image formats and pixel dimensions, I start to back away from this command pretty 
quickly.

I admit that it would be different if I implemented my own graphical FTP client and my own FTP server, because I could control the 
THMB interoperability in a way of my choosing - for example, I could define a specific set of thumbnail dimensions and only use JPG 
format. But since I would only be implementing an FTP server, I don't like the way the odds are stacked. (In some ways this is like 
writing a function to strip whitespace from text files - if you generate all of your own text files, then you only have to 
anticipate what you've defined as whitespace. But when you're stripping whitespace from someone else's text files, you have to 
anticipate 0x20, 0x09, 0xA0, multiple character sets, what to do when you get character codes that you don't recognize, etc.)

Just the same, I'd like to see some form of thumbnail functionality if possible, because the scenario about astrophotography wasn't 
hypothetical - I actually have a friend that works with astrophotography who generates those types of huge image files; having an 
effective method to retrieve thumbnails for those large images would be great. But I don't think that the current draft offers an 
example of an effective method when I consider having to create dynamically-generated thumbnails for a near-infinite number of 
possible client request parameters.

Thanks again!

Robert McMurray
robmcm@microsoft.com