draft-ietf-6man-uri-zoneid-02.txt

Stuart Cheshire <cheshire@apple.com> Tue, 17 July 2012 00:35 UTC

Return-Path: <cheshire@apple.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 28CD911E809B for <ipv6@ietfa.amsl.com>; Mon, 16 Jul 2012 17:35:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.599
X-Spam-Level:
X-Spam-Status: No, score=-110.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6SfB8Gfy6Afg for <ipv6@ietfa.amsl.com>; Mon, 16 Jul 2012 17:35:07 -0700 (PDT)
Received: from mail-out.apple.com (honeycrisp.apple.com [17.151.62.51]) by ietfa.amsl.com (Postfix) with ESMTP id 6FE1311E8073 for <ipv6@ietf.org>; Mon, 16 Jul 2012 17:35:01 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/mixed; boundary="Boundary_(ID_xtEdFOi8lAxCsFoQpEbj2A)"
Received: from relay13.apple.com ([17.128.113.29]) by mail-out.apple.com (Oracle Communications Messaging Server 7u4-23.01 (7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTPS id <0M7A00DWZ2AGXNO4@mail-out.apple.com> for ipv6@ietf.org; Mon, 16 Jul 2012 17:35:39 -0700 (PDT)
X-AuditID: 1180711d-b7f406d000004330-35-5004b35b9909
Received: from [17.193.13.41] (chesh1.apple.com [17.193.13.41]) by relay13.apple.com (Apple SCV relay) with SMTP id D3.39.17200.B53B4005; Mon, 16 Jul 2012 17:35:39 -0700 (PDT)
To: ipv6@ietf.org
Message-id: <221B8D89-0B8E-498A-9C8C-74CC3D305FD1@apple.com>
From: Stuart Cheshire <cheshire@apple.com>
Subject: draft-ietf-6man-uri-zoneid-02.txt
Date: Mon, 16 Jul 2012 17:34:47 -0700
X-Mailer: Apple Mail (2.753.1)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrOLMWRmVeSWpSXmKPExsUieJBXUzd6M0uAwYWp0hYvz75ncmD0WLLk J1MAYxSXTUpqTmZZapG+XQJXxoMDnWwFj5awV0xbuIu5gfF1M3sXIyeHhICJxKGnGxkhbDGJ C/fWs3UxcnEICWxklPi4+hwLSIJXwEji6uSpYA0iAoIS2x/8AIpzAMVtJPYdTQAJMws4SBx8 2QJWziagJfHi8xU2EFsYyH778SAriM0ioCpx5+EeJohdchKHT79inMDIPQvJhllIRkHY8hLb 385hhrC9JI51PmaCsBUlpnQ/hKr3ldjd1MAKYTtLXO9qZ8NU4y7RfHIfVI2txImT89lwmb+A kWcVo2BRak5ipaGxXmJBQU6qXnJ+7iZGUHA3FMruYNz/k/8QowAHoxIP7y0blgAh1sSy4src Q4wSHMxKIrzTZwGFeFMSK6tSi/Lji0pzUosPMUpzsCiJ85okAaUE0hNLUrNTUwtSi2CyTByc Ug2MixwOJKueuTfD65Nr8Z/o3oznk4N7Nkt73M10qp0vxZZ8NWpWeL783xea96Pn9MasY9wf +XKd7K6OOdftBHJmRi99fKT9Ruq+lkOLX57tfnVzmmt3Wnhga7CK3oetFTl+QoKr9nNnJL+6 9PbLzLhzx12nWJxMeh5xSOtd984Ds0L3qCrJRYbzKLEUZyQaajEXFScCAC6lkRRqAgAA
X-Mailman-Approved-At: Mon, 16 Jul 2012 18:26:16 -0700
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jul 2012 00:35:08 -0000

I'm glad we're having this discussion about zone identifiers in IPv6  
address literals, but I'm very disturbed by the direction the  
discussion seems to be going.

To put it simply, today we don't have a problem, but publishing this  
document as it currently stands would be creating one.

I'll explain what I mean by that:

Escaping (e.g. "%" <-> "%25") or some other transformation (e.g. "%"  
<-> "-") is needed *only* when required to allow unambiguous parsing:  
when a delimiter character is also allowed as a data character, then  
escaping or some other data transformation is necessary to  
disambiguate data from delimiters.

Using escaping when it's *not* necessary is a bad idea, because  
escaping is not free. Escaping has a cost, in terms of CPU cycles, in  
terms of increased data size, and -- most importantly -- in terms of  
cognitive cost. When data has different forms in different contexts,  
users and developers have a hard time remembering what goes where.  
Attached are a couple of examples from my daily life. What is "Magic% 
20Bell"? What is "Gabbie&#39;s home"? In both cases here, internal  
data escaping has unintentionally leaked through into the user  
interface.


This cognitive cost can also have a security cost, as draft-ietf-6man- 
uri-zoneid-02.txt points out, when two strings that are textually  
different have the same meaning.

Since escaping has all these costs, escaping should only be used  
where it's actually needed.

And here, in IPv6 addresses in URIs, escaping is *not* actually  
needed to allow unambiguous parsing. Since it's not needed, why  
inflict this pain on ourselves?

The argument seems not to be one reasoning from computer science  
first principles (since there is in fact no need for escaping to  
allow unambiguous parsing) but instead one reasoning from what is  
sometimes called "RFC lawyering".

The document draft-ietf-6man-uri-zoneid-02.txt says as much:

    Some versions of some browsers accept the RFC 4007 syntax for scoped
    IPv6 addresses embedded in URIs, i.e., they have been coded to
    interpret the "%" sign according to RFC 4007 instead of RFC 3986.
    Clearly this approach is very convenient for users, although it
    formally breaches the syntax rules of RFC 3986.

The document concedes that using IPv6 addresses with a "%" zone  
identifier in URIs does actually work, and "is very convenient for  
users", and the only reason not to use it is that it "formally  
breaches the syntax rules of RFC 3986". We should fix RFC 3986 to  
match reality, rather than insisting that reality be constrained by a  
minor mistake in RFC 3986.

In fact, careful reading of RFC 3986 makes it far from clear that  
using IPv6 addresses with a "%" zone identifier would in fact breach  
the syntax rules. RFC 3986 states that percent-encoding should be  
done on a component-by-component basis, and for each component of a  
URI, escaping is done *only* as necessary for that specific  
component. See excerpts from RFC 3986 below. Emphasis (in caps) added  
by me:

1. The percent-encoding rules are per component, not global for the  
URI as a whole:

    A percent-encoding mechanism is used to represent a data octet
    IN A COMPONENT when that octet's corresponding character is  
outside the
    allowed set or is being used as a delimiter of, or within, THE  
COMPONENT.

2. Characters should not be percent-encoded where they are  
specifically allowed by the component in question:

    URI producing applications should percent-encode data octets that
    correspond to characters in the reserved set UNLESS THESE  
CHARACTERS ARE
    SPECIFICALLY ALLOWED BY THE URI SCHEME TO REPRESENT DATA IN THAT  
COMPONENT.

IPv6 literals *do* specifically allow percent signs, and so by this  
reading of RFC 3986 percent signs do not need to be (and should not  
be) escaped.

The document draft-ietf-6man-uri-zoneid-02.txt even concedes that  
this is the most user-friendly approach:

    The authors believe it is feasible, and very convenient for  
users, if
    browsers also allow (in addition to the formal URI syntax defined in
    this document) a syntax that will enable cut and paste.  For  
example:

      http://[fe80::a%en1]

    It seems that modern browsers can be adapted to parse this  
because it
    is inside of the "[" "]"'s.  This would permit the output of  
commands
    like ping6 -w ff02::1%en1 to be "cut and pasted" into a browser
    address bar.  Consequently this document recommends that browsers
    support this syntax in addition to the formal URI syntax defined  
above.

If we are advocating that web browsers accept this syntax (and that  
is indeed what I am myself advocating internally at Apple) then what  
benefit do we gain by specifying that one syntax is acceptable in  
some places and a different syntax is acceptable in other places?

Let's just publish a document which states that the currently- 
accepted and widely-used IPv6 "%zone" notation is also fine for use  
in URIs. This document would also in effect by clarifying RFC 3986 to  
state that "%" characters *only* need to be escaped in those URI  
components that actually use "%xx" escaping, and MUST NOT be escaped  
in URI components like IP-literals that are specified to not use "% 
xx" escaping.

That clears up the ambiguity in RFC 3986, and lets people continue  
using the IPv6 "%zone" notation they already understand.

Stuart Cheshire