[apps-discuss] Fwd: I-D Action: draft-klensin-ftpext-typeu-00.txt
"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 02 April 2012 03:11 UTC
Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0D5021F85A1 for <apps-discuss@ietfa.amsl.com>; Sun, 1 Apr 2012 20:11:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -96.537
X-Spam-Level:
X-Spam-Status: No, score=-96.537 tagged_above=-999 required=5 tests=[AWL=0.653, BAYES_50=0.001, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p2PO322XeyLL for <apps-discuss@ietfa.amsl.com>; Sun, 1 Apr 2012 20:11:15 -0700 (PDT)
Received: from scintmta01.scbb.aoyama.ac.jp (scintmta01.scbb.aoyama.ac.jp [133.2.253.33]) by ietfa.amsl.com (Postfix) with ESMTP id C72E011E80A3 for <apps-discuss@ietf.org>; Sun, 1 Apr 2012 20:11:08 -0700 (PDT)
Received: from scmse01.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta01.scbb.aoyama.ac.jp (secret/secret) with SMTP id q323Awkh014652 for <apps-discuss@ietf.org>; Mon, 2 Apr 2012 12:10:58 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 72a4_5c6f_780caece_7c71_11e1_8497_001d096c566a; Mon, 02 Apr 2012 12:10:57 +0900
Received: from [IPv6:::1] ([133.2.210.1]:42876) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S15B1C7E> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Mon, 2 Apr 2012 12:11:01 +0900
Message-ID: <4F7918C0.9020204@it.aoyama.ac.jp>
Date: Mon, 02 Apr 2012 12:10:56 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: John C Klensin <klensin@jck.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: [apps-discuss] Fwd: I-D Action: draft-klensin-ftpext-typeu-00.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Apr 2012 03:11:16 -0000
Hello John, others, Because there was a new version, I took a look, and I have a few comments that I don't want to withhold. *For my main comment, please see the end of this mail.* The history section should be relegated to an appendix or removed. [I enjoyed reading [RFC0373]; it's very interesting to compare to what we have now. It's not completely possible for me to judge how much of it was visionary at the time, and how much common sense, maybe in other circles than the ARPAnet]. "So, with allowances for those line termination problems -- which have been a large issue in many cases -- Image ("binary") and ASCII transfers were almost equivalent and the TYPE command became less-used." I'm not a frequent user of FTP anymore, but I think this observation is correct. I therefore strongly wonder why we would need this new TYPE. For more, please see below. "several variations on UTF-16 (possibly with surrogate pairs)": This is highly misleading. UTF-16 always includes the potential for surrogate pairs, by definition. [Of course, many actual data files don't include them, but that doesn't have to be called out.] The thing that doesn't allow surrogate pairs is called USC-2 (ISO-10646-UCS-2 in http://www.iana.org/assignments/character-sets). "When those files are transferred to another system with Image type, the result may be completely uninterpretable on the target system." This is of course possible, in particular for executables and the like. But I think there is much less variety in formats, and much more versatility in tools, so this is less an issue than it was, and even if it continues to be an issue, it's not something that can be solved with a simple type. "by sending the data in a stream conformant to the Net-Unicode format specified in Section 3." This is confusing, because Net-Unicode is defined in RFC 5198, not in section 3. Can probably be fixed with a small wording change. "This section specifies a profile of Net-Unicode [RFC5198] for use with FTP TYPE U." In German, there's a saying "Meister, die Arbeit is fertig, soll ich sie gleich flicken." (Master, I completed my work, can I start to fix it?) It's used when something is created as broken. It may not be the case for Net-Unicode, but a sentence like the above just exudes the feeling that the original definition may be broken to me, sorry. MAIN COMMENT STARTS HERE "Unicode characters must be transmitted in UTF-8 [RFC3629] as specified for Net-Unicode." This brings me to my main point, which is that the proposal isn't really implementable. It assumes that, like in the old days, there is a single textual encoding per computer. This worked very well at a time when some computers were using (7-bit) US-ASCII, and others were using the basic version of EBCDIC (for those who might not know, there are lots of EBCDIC variants including double-byte variants for East Asia). "However, migration to Unicode has reintroduced many of the old issues. When Unicode is used inside a system, it can be used with several different encodings (e.g., UTF-8 and several variations on UTF-16 (possibly with surrogate pairs), different assumptions about normalization (see "Terminology for Use in Internationalization" [i18n-terms] for more discussion) and even new variations on line termination conventions. When those files are transferred to another system with Image type, the result may be completely uninterpretable on the target system." This mostly has it wrong. The issue of many different character encodings has been around for a long time before Unicode. FTP hasn't done anything about this, and apparently has been fine (mostly because applications and users know how to deal with the problem: If you see garbage on screen, try another application or use another setting). Unicode is helping in that it greatly *reduces* variation, but in the meantime, it adds variation because it leads to a net increase of character encodings (even if there were only one encoding form of Unicode). The fact that Unicode can come in different encoding forms and other variations can definitely be annoying, but a new TYPE is not a solution. Why? Because it's not like in the old days that one maker's products would use one encoding, and another would use another. There's overall probably more UTF-16, and more of the LE variety, on a Windows System than on a Linux or Mac system, but there's a lot of UTF-8 on all of them, and there's usually also quite a bit of legacy data (e.g. Shift_JIS or EUC-JP on a Japanese system) and on average, the OS doesn't have a clue about the encoding of the file. Applications make guesses, and they often get it right. An FTP implementation could make guesses, but the problem is that the guess is too early; if it is wrong, then it we get weird double encodings. That's different from a text editor making a guess; the user can try other encodings from a menu until the stuff is right, and the binary data isn't messed up. The situation is even worse for normalization, in the sense that no OS has any clue about whether files are normalized or not. The line ending issue is not as hopeless, in that it's easy to write a filter that converts all line endings e.g. to CRLF, even including the (quite rare in actual use) new Unicode ones. So the draft might make marginal sense if limited to line-ending issues for UTF-8 only. But then again, because that's an easy problem, there are lots of applications out there that deal with this already, including any serious text editor. So overall, I don't think the Apps WG should spend time with this, unless we hear loud voices from actual FTP implementers that tell us that this is needed and will be implementable in a way that solves actual problems. Regards, Martin.
- [apps-discuss] Fwd: I-D Action: draft-klensin-ftp… Martin J. Dürst