Re: [RFC 959] FTP in ASCII mode

First of all thanks to everybody for the response.

I knew that a FTP transfer in ASCII mode does EOL and EOF conversions based
on the OS of the receiving system. And I very much expected my UTF-8 encoded
file to get garbled when I FTPied it in ASCII mode. But guess what, it was
not garbled on the receiving system. Maybe I was lucky, or maybe its because
UTF-8 is backward compatible with ASCII. But then, as ASCII is purely
7-bits, the FTP in ASCII mode should have corrupted the UTF-8 encoded file,
because UTF-8 is 8-bits.

Moreover, in ASCII code page, code point 13=CR and code point 10=LF, but
that might not be the case in every other code page. Hence the EOL
conversion (in FTP ASCII mode) might corrupt that text file if it is encoded
using a non-ASCII encoding. And what about handling the Unicode NewLine
characters? Anyway...

After reading all the wonderful replies, my conclusion is, even though my
FTP client/server handled the UTF-8 encoded text file (which BTW contained
Devanagri characters) correctly, there is a possibility that a text file,
encoded in an encoding other than ASCII runs a risk of being corrupted when
FTPied in ASCII mode. Therefore, always use ASCII mode to transfer only
ASCII encoded files, and Binary mode to transfer non-ASCII encoded files.

I was wondering why isn't there something like a "Text" mode for FTPing text
files, which could handle text files encoded using any encoding available in
this world, and then, the FTP client/server still does the EOL and EOF
conversions properly?

Thanks,
Sandeep.

On 2/21/06, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:
>
> John C Klensin wrote:
>
> > Sandeep's question raises another interesting issue.  I just
> > went back and reread RFC 2640.   It does not seem to address the
> > "TYPE A" issue at all.  It does say (Section 2, paragraph 1)
> > "Clients and servers are, however, under no obligation to
> > perform any conversion on the contents of a file for operations
> > such as STOR or RETR", which I would take to imply that it
> > anticipates I18N FTP operations to be entirely binary ("TYPE I")
> > although that is not explicit.
>
> As for Japanese processing, UTF-8 is not visible by users and on
> the network, because UTF-8 is not only useless but also harmful.
>
> Instead, ISO-2022-JP, ShiftJIS and EUC are the major character sets.
> Some ftp implementations does assume (sometimes depending on environment
> variables) network character code ShiftJIS or EUC and perform appropriate
> conversions, which garbles UTF-8.
>
> On the other hand, if you use ISO-2022-JP, which is 7 bit pure and ASCII
> compatible (in a sense, it is pure ASCII), we can safely use ASCII mode
> of vanilla ftp and there is no confusion as long as we are in ASCII
> environment.
>
> Similar encoding can be profiled using ISO 2022 to obtain a fully
> internationalized, 7 bit pure, ASII compatible character encoding.
>
> The only problem for RFC2460 was that it does not need MIME for
> charset and 8bit extension that it makes it clear that MIME is
> useless.
>
> Note that long term state maintainance of full ISO 2022 is not
> more complex than that of UTF-8. Note also that, carefully profiled
> ISO 2022, such as ISO-2022-JP, requires state maintainance a lot
> simpler than that of UTF-8.
>
> > Whether the characters in use are UTF-8 or not, we've still got
> > that issue with line-endings.
>
> Line-ending issues of any ISO 2022 based encoding are just as simple
> as those of ASCII.
>
>                                                         Masataka Ohta
>
>
>
> _______________________________________________
> Ietf mailing list
> Ietf@ietf.org
> https://www1.ietf.org/mailman/listinfo/ietf
>