Re: [RFC 959] FTP in ASCII mode

"Sandeep Srivastava" <sandeep.kumar.srivastava@gmail.com> Tue, 21 February 2006 10:58 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1FBVDZ-0006yJ-Qn; Tue, 21 Feb 2006 05:58:05 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FBVDX-0006yA-Pi for ietf@ietf.org; Tue, 21 Feb 2006 05:58:03 -0500
Received: from zproxy.gmail.com ([64.233.162.207]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FBVDX-0005E0-7v for ietf@ietf.org; Tue, 21 Feb 2006 05:58:03 -0500
Received: by zproxy.gmail.com with SMTP id 9so1191487nzo for <ietf@ietf.org>; Tue, 21 Feb 2006 02:58:03 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=YlZs7BaeH/0bbnZogBmZiLZGtT52bgNq449l0Pbpda1LSgwiN/it3xuc6YqF6FsKYsj69kYpJd9X1dta+azeibos0CyYX+k6G7B/9xL/KlvU+ynct/PjK1llScxoXbPxF05d3fxJuFj0UcYnNWBJEXcdn676ct97ntTxkdX/FFI=
Received: by 10.36.41.4 with SMTP id o4mr4242000nzo; Tue, 21 Feb 2006 02:58:02 -0800 (PST)
Received: by 10.36.33.2 with HTTP; Tue, 21 Feb 2006 02:58:02 -0800 (PST)
Message-ID: <802d52a30602210258h71a31c82hd557d04776c80608@mail.gmail.com>
Date: Tue, 21 Feb 2006 16:28:02 +0530
From: Sandeep Srivastava <sandeep.kumar.srivastava@gmail.com>
To: John C Klensin <john-ietf@jck.com>
In-Reply-To: <81FAA7F6A3A5323A86BBE227@p3.JCK.COM>
MIME-Version: 1.0
References: <43F9ED6A.5090901@peter-dambier.de> <30C473BA61A6E355540B54DE@p3.JCK.COM> <43FA82F1.9010401@necom830.hpcl.titech.ac.jp> <802d52a30602202323o27715984lecfe7d8fa835565f@mail.gmail.com> <81FAA7F6A3A5323A86BBE227@p3.JCK.COM>
X-Spam-Score: 0.3 (/)
X-Scan-Signature: a4cdc653ecdd96665f2aa1c1af034c9e
Cc: ietf@ietf.org
Subject: Re: [RFC 959] FTP in ASCII mode
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1923535234=="
Errors-To: ietf-bounces@ietf.org

Thanks John. Please see my response in-line...

On 2/21/06, John C Klensin <john-ietf@jck.com> wrote:
>
>
>
> --On Tuesday, 21 February, 2006 12:53 +0530 Sandeep Srivastava
> <sandeep.kumar.srivastava@gmail.com> wrote:
>
> > First of all thanks to everybody for the response.
> >
> > I knew that a FTP transfer in ASCII mode does EOL and EOF
> > conversions based on the OS of the receiving system.
>
> No, it doesn't.  That was part of the point.  It does no EOF
> conversions at all.   The command and data channels were
> separated for several reasons, but the desire to stay out of the
> EOF business was an important one.


Right. I understand the command/data channel part -- i.e. instead of sending
the EOF as a data, it is sent as a command, and the receiver can then use
the OS specific EOF. The overall affect to the end user is as if both EOL
and EOF are converted to the receiving OS defaults.


And the server is required
> to convert whatever line-end convention it uses to CRLF, and any
> characters it uses to ASCII, and transmit that over the wire.


I don't understand this point very well. Does it mean that as per the FTP
RFC the server reads 8-bits at a time, and sets the most significant bit to
zero (because ASCII is 7-bits) before transmitting it in ASCII mode? If this
is the case, then how did a UTF-8 encoded file containing Devanagri
characters (i.e. characters greater than 7F) got FTPied over (and back)
correctly in ASCII mode.

If not, -- i.e. it does not sets the msb to zero, then how does ASCII mode
differs from Binary mode?

Scenario:
I am using WS-FTP pro as the client on my windows 2000 machine, to FTP to
and back from a Solaris box (acting as FTP server).

Thanks,
Sandeep.


If the client then converts from CRLF and ASCII to some local
> convention, that is its business, not that of the protocol.  In
> other words, there are, at most, conversions to and from CRLF
> and ASCII. There are no FTP-specified conversions based on the
> properties of the receiving system.

> And I
> > very much expected my UTF-8 encoded file to get garbled when I
> > FTPied it in ASCII mode. But guess what, it was not garbled on
> > the receiving system. Maybe I was lucky, or maybe its because
> > UTF-8 is backward compatible with ASCII. But then, as ASCII is
> > purely 7-bits, the FTP in ASCII mode should have corrupted the
> > UTF-8 encoded file, because UTF-8 is 8-bits.
>
> "Should have corrupted" is what I referred to as an ambiguity in
> my note.   First of all, because of the robustness principle,
> you can never guarantee that bad things will happen when they
> might -- proper implementation of protocols around her often
> argues for never trashing a string because one can or because a
> correct string wouldn't have the problem.
>
> So, in practice, if an FTP server was implemented on an ASCII
> system that used the "right justified in octets" model but with
> LF as line-end, the authors might have well said "the character
> codes don't need any conversion for ASCII mode, we just need to
> implement conversion to CRLF".  If they had done that, and UTF-8
> (or ISO 8859 Latin-1 or...) were added to the system, those CCSs
> would go through nicely in ASCII mode, with the right
> line-endings.  Substantially the same thing would occur, as
> Ohta-san points out, with many of the ISO 2022-based encodings
> of non-ASCII characters: completely safely with some of them and
> at least as safely as UTF-8 with the others although, as with
> UTF-8, the claim of strict ASCII would be technically false.
> Now that wouldn't happen with a system that was natively EBCDIC,
> or ASCII stored in seven bit chunks without padding, etc.: those
> systems would need to do real conversions to get to network
> ASCII and, if you thought you were getting UTF-8 over them, you
> would be in big trouble.
>
> > Moreover, in ASCII code page, code point 13=CR and code point
> > 10=LF, but that might not be the case in every other code
> > page. Hence the EOL conversion (in FTP ASCII mode) might
> > corrupt that text file if it is encoded using a non-ASCII
> > encoding. And what about handling the Unicode NewLine
> > characters? Anyway...
>
> Again, there is no conversion in the FTP protocol to local
> character set, only to (and, outside the protocol but common in
> client implementations) conversation to network ASCII with its
> CRLF line endings.
>
> > After reading all the wonderful replies, my conclusion is,
> > even though my FTP client/server handled the UTF-8 encoded
> > text file (which BTW contained Devanagri characters)
> > correctly, there is a possibility that a text file, encoded in
> > an encoding other than ASCII runs a risk of being corrupted
> > when FTPied in ASCII mode. Therefore, always use ASCII mode to
> > transfer only ASCII encoded files, and Binary mode to transfer
> > non-ASCII encoded files.
>
> Yes, that is probably wise guidance.  However, if you transfer
> textual materials in binary (Image) mode, you also need to be
> sure that you have programs available on the receiving host to
> change line-end conventions from whatever the server uses
> internally to whatever the client system uses.
>
> > I was wondering why isn't there something like a "Text" mode
> > for FTPing text files, which could handle text files encoded
> > using any encoding available in this world, and then, the FTP
> > client/server still does the EOL and EOF conversions properly?
>
> For starters, because it would require that every FTP server
> support at least the several thousand coded character sets in
> the world.  Even for end of line, there are significantly more
> different conventions than you seem to think there are.
> "Convert from whatever we use as text here to a single standard
> form, and then let the recipient sort out conversion from the
> standard form to its preferred local form" is much more
> plausible -- it requires the server to support one type of
> conversion, not thousands, and the client to support one type of
> conversion, not thousands.  In the early 1970s, the appropriate
> standard form for transmission was network ASCII (including both
> "right justified in eight bits" and CRLF).  Today, it is
> probably UTF-8 with CRLF (although I sympathize with Ohta-san's
> desire to be able to transmit 2022-based systems in canonical
> form) and I think we should be considering that TYPE.  But ideas
> about universal converters make both bad protocol design and bad
> implementations.
>
>     john
>
>
>
_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf