Re: [RFC 959] FTP in ASCII mode

John C Klensin <john-ietf@jck.com> Tue, 21 February 2006 08:34 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1FBSyj-0006Rj-3T; Tue, 21 Feb 2006 03:34:37 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FBSyh-0006RZ-Fg for ietf@ietf.org; Tue, 21 Feb 2006 03:34:35 -0500
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FBSyg-0000ce-Lg for ietf@ietf.org; Tue, 21 Feb 2006 03:34:35 -0500
Received: from [127.0.0.1] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.34) id 1FBSya-000LoS-SD; Tue, 21 Feb 2006 03:34:29 -0500
Date: Tue, 21 Feb 2006 03:34:27 -0500
From: John C Klensin <john-ietf@jck.com>
To: Sandeep Srivastava <sandeep.kumar.srivastava@gmail.com>, ietf@ietf.org
Message-ID: <81FAA7F6A3A5323A86BBE227@p3.JCK.COM>
In-Reply-To: <802d52a30602202323o27715984lecfe7d8fa835565f@mail.gmail.com>
References: <43F9ED6A.5090901@peter-dambier.de> <30C473BA61A6E355540B54DE@p3.JCK.COM> <43FA82F1.9010401@necom830.hpcl.titech.ac.jp> <802d52a30602202323o27715984lecfe7d8fa835565f@mail.gmail.com>
X-Mailer: Mulberry/4.0.4 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: a7d2e37451f7f22841e3b6f40c67db0f
Cc:
Subject: Re: [RFC 959] FTP in ASCII mode
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Errors-To: ietf-bounces@ietf.org


--On Tuesday, 21 February, 2006 12:53 +0530 Sandeep Srivastava
<sandeep.kumar.srivastava@gmail.com> wrote:

> First of all thanks to everybody for the response.
> 
> I knew that a FTP transfer in ASCII mode does EOL and EOF
> conversions based on the OS of the receiving system.

No, it doesn't.  That was part of the point.  It does no EOF
conversions at all.   The command and data channels were
separated for several reasons, but the desire to stay out of the
EOF business was an important one.  And the server is required
to convert whatever line-end convention it uses to CRLF, and any
characters it uses to ASCII, and transmit that over the wire.
If the client then converts from CRLF and ASCII to some local
convention, that is its business, not that of the protocol.  In
other words, there are, at most, conversions to and from CRLF
and ASCII. There are no FTP-specified conversions based on the
properties of the receiving system.  

> And I
> very much expected my UTF-8 encoded file to get garbled when I
> FTPied it in ASCII mode. But guess what, it was not garbled on
> the receiving system. Maybe I was lucky, or maybe its because
> UTF-8 is backward compatible with ASCII. But then, as ASCII is
> purely 7-bits, the FTP in ASCII mode should have corrupted the
> UTF-8 encoded file, because UTF-8 is 8-bits.

"Should have corrupted" is what I referred to as an ambiguity in
my note.   First of all, because of the robustness principle,
you can never guarantee that bad things will happen when they
might -- proper implementation of protocols around her often
argues for never trashing a string because one can or because a
correct string wouldn't have the problem.

So, in practice, if an FTP server was implemented on an ASCII
system that used the "right justified in octets" model but with
LF as line-end, the authors might have well said "the character
codes don't need any conversion for ASCII mode, we just need to
implement conversion to CRLF".  If they had done that, and UTF-8
(or ISO 8859 Latin-1 or...) were added to the system, those CCSs
would go through nicely in ASCII mode, with the right
line-endings.  Substantially the same thing would occur, as
Ohta-san points out, with many of the ISO 2022-based encodings
of non-ASCII characters: completely safely with some of them and
at least as safely as UTF-8 with the others although, as with
UTF-8, the claim of strict ASCII would be technically false.
Now that wouldn't happen with a system that was natively EBCDIC,
or ASCII stored in seven bit chunks without padding, etc.: those
systems would need to do real conversions to get to network
ASCII and, if you thought you were getting UTF-8 over them, you
would be in big trouble.

> Moreover, in ASCII code page, code point 13=CR and code point
> 10=LF, but that might not be the case in every other code
> page. Hence the EOL conversion (in FTP ASCII mode) might
> corrupt that text file if it is encoded using a non-ASCII
> encoding. And what about handling the Unicode NewLine
> characters? Anyway...

Again, there is no conversion in the FTP protocol to local
character set, only to (and, outside the protocol but common in
client implementations) conversation to network ASCII with its
CRLF line endings.

> After reading all the wonderful replies, my conclusion is,
> even though my FTP client/server handled the UTF-8 encoded
> text file (which BTW contained Devanagri characters)
> correctly, there is a possibility that a text file, encoded in
> an encoding other than ASCII runs a risk of being corrupted
> when FTPied in ASCII mode. Therefore, always use ASCII mode to
> transfer only ASCII encoded files, and Binary mode to transfer
> non-ASCII encoded files.

Yes, that is probably wise guidance.  However, if you transfer
textual materials in binary (Image) mode, you also need to be
sure that you have programs available on the receiving host to
change line-end conventions from whatever the server uses
internally to whatever the client system uses.

> I was wondering why isn't there something like a "Text" mode
> for FTPing text files, which could handle text files encoded
> using any encoding available in this world, and then, the FTP
> client/server still does the EOL and EOF conversions properly?

For starters, because it would require that every FTP server
support at least the several thousand coded character sets in
the world.  Even for end of line, there are significantly more
different conventions than you seem to think there are.
"Convert from whatever we use as text here to a single standard
form, and then let the recipient sort out conversion from the
standard form to its preferred local form" is much more
plausible -- it requires the server to support one type of
conversion, not thousands, and the client to support one type of
conversion, not thousands.  In the early 1970s, the appropriate
standard form for transmission was network ASCII (including both
"right justified in eight bits" and CRLF).  Today, it is
probably UTF-8 with CRLF (although I sympathize with Ohta-san's
desire to be able to transmit 2022-based systems in canonical
form) and I think we should be considering that TYPE.  But ideas
about universal converters make both bad protocol design and bad
implementations.

    john



_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf