Re: Reopening RFC6874?

Andrew Cady <andy@cryptonomic.net> Tue, 22 June 2021 19:21 UTC

Return-Path: <andy@cryptonomic.net>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0CB953A13E9 for <ipv6@ietfa.amsl.com>; Tue, 22 Jun 2021 12:21:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CIdcKnFfezk3 for <ipv6@ietfa.amsl.com>; Tue, 22 Jun 2021 12:21:32 -0700 (PDT)
Received: from zukertort.childrenofmay.org (zukertort.childrenofmay.org [149.56.44.185]) by ietfa.amsl.com (Postfix) with ESMTP id CE5C93A13E7 for <ipv6@ietf.org>; Tue, 22 Jun 2021 12:21:32 -0700 (PDT)
Received: by zukertort.childrenofmay.org (Postfix, from userid 1000) id 5E3A5F14944; Tue, 22 Jun 2021 15:21:29 -0400 (EDT)
Date: Tue, 22 Jun 2021 15:21:28 -0400
From: Andrew Cady <andy@cryptonomic.net>
To: 6man <ipv6@ietf.org>
Subject: Re: Reopening RFC6874?
Message-ID: <20210622192128.2whf753qj2ws27cm@zukertort.childrenofmay.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <c1270055-9d41-d826-8ff2-3647feb9861c@gmail.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/FPLeDZXqJ1zwE1yF_Qkh7Ldq120>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2021 19:21:38 -0000

Hi folks,



                                       In the digital medium,
                                       the agency of the interactor
                                       is the most important design goal.
                                       --Janet H. Murray

# Introduction #

First of all I want to say that this is no niche purpose.

This is EARLY ADOPTION of the VERY BEST way to configure a LAN in the 21st
century.  The IPv6 way.

Why is it the VERY BEST?

It's about IP connectivity under the constraint of zero configuration:

 * fe80 connects two computers directly connected by an ethernet cable
 * fe80 connects two computers on the same Wifi LAN
 * fe80 connects two computers linked via secure VPN over internet
 * fe80 connects computers that don't support DHCPv4 (e.g., static IP)
 * fe80 connects computers that don't support DHCPv6
 * fe80 connects computers that don't support IPv4LL
 * fe80 connects computers that don't support IPv6 router advertisement
 * fe80 connects computers when just you don't know the "real" address
 * fe80 works without interruption when the ISP changes/nukes your prefix
 * fe80 devices can broadcast their hostnames for name resolution w/ LLMNR

In fact, ONLY fe80 provides UNIVERSAL CONNECTIVITY between ALL devices.

It always just works and it never requires configuration on either side.

That's already great.  That's already worth getting behind.  That's already
worth making HTTP work.  But it goes far beyond this when you consider that
it is possible to use IPv6 prefix translation at the router to make all of
these fe80 addresses _also_ work on the wide internet!

That's where every simple LAN in every home can be opened or closed to the
entire internet at the stroke of a switch at the Wifi router.



The outcome we will establish here is this:

  Zero configuration:

    Every computer that can be plugged into a network is
    always accessible to web browsers on the local network.

  One-time router configuration:

    Every computer plugged into the configured network is
    always accessible to every web browser on
    the internet.



There's just one thing stopping it.  Port 80 don't work!

An IP link you can't connect a browser over just ain't no good.

We need to make port 80 work!




# The Problem #

The reason port 80 don't work is that the HTTP standard was broken.

The HTTP1.1 standard expects the 'Host' string to be passed unchanged from user
to 'user agent' to server.[1]

That expectation is violated by the connectivity-destroying requirement to
mangle the 'Host' header string.

As I said on the Mozilla bug thread:



  The whole point of the Host: header is to give the server a piece of
  the client's perspective.

  So a rule that forbids sharing because it leaks the client's
  perspective is inapplicable in this case.



A more detailed explanation, also quoted from there, is duplicated at the bottom
of this message.[2]  The key conclusion to reach from that material is this:



  KEY FACT:

    The web server wants to know what the browser's location bar says.



I would even go further, and say that the web USER has a RIGHT to faithful
delivery of the complete request, unfiltered and unmangled. The "USER AGENT"
must respect the AGENCY OF THE USER.



# Percent Escaping #

Within the hostname part of the URL bar UI element the '%' should be treated
identically to the '[' and the ']':

 1. Never display '%25' to a user.

 2. Never require the user to input '%25' instead of inputting only '%'.

 3. When the user inputs '%25', it must always mean the ZoneID begins with '25',
    because that is what it always means in other contexts.

 4. Copy must put into Paste Buffer the string without '%25' so that it can be
    used without modification by other applications on the system.

On the web server when receiving a ZoneID in 'Host' field:

 1. The user's supplied 'Host' string must be faithfully reflected back to the
    user without requiring a new layer of percent-decoding in server
    applications that currently produce absolute links.

    For example, the default Apache greeting must not display '%25'. That
    greeting presently looks like this on one system:

      <address>Apache/2.4.46 (Debian) Server at ::1 Port 80</address>

    When '::1' is replaced by 'fe80::'... there must not be any '%25'.

 2. The CGI $SERVER_NAME variable must not escape the '%' because it must contain
    a string that is a valid name or IP on the client's system.

These constraints mandate NOT escaping the '%' -- neither on the HTTP session
wire, nor in HTML strings.

Just logically, an escaped '%25' would imply data, not delimiter, but '%' is a
syntactic delimiter.

RFC3986, which specifies URI syntax, has a list of reserved subcomponent
delimiter characters -- which does not include '%'.  But '%' is still reserved
"enough" because it is the escape character and because percent-decoding is done
after parsing into components and subcomponents, and because '[' begins an IPv6
literal which cannot contain percent-encoded values, per its BNF in that RFC.

In other words, RFC3986-conforming parsers will degrade to an error about an
invalid IPv6 address if they encounter a '%' -- they will never try to decode
it.

So it's safe and unambiguous to treat '%' as a subcomponent delimiter for IPv6
addresses, and leave it unescaped.




# Solution to Problem #

The way to fix port 80 is to abolish the name-mangling middle-man
forever.

Let the server know exactly what the browser's location bar says.
Let the user tell the location bar exactly what the user will.
Make the unbroken user<->server bond a requirement of HTTP.




Making this work is worth it.

Thank you.








Footnotes
---------

1. Technically it allows a proxy to edit names, but even then only to fully
   qualify an unqualified name.  This loses a very small amount of information
   about user input: exactly 1 binary digit, reflecting whether the user
   qualified the name.  This literal one bit of loss perhaps inspired the
   disaster of losing an entire namespace.  Never again!  Not a single bit!

2. Brian E Carpenter explains that RFC6874 says ZoneID "is by definition
   meaningless outside the host."  But ZoneID is meaningful within the HTTP
   protocol:

     It is definitely alright to echo a local name back and forth over a TCP
     link, which is guaranteed to route such a message to a host where it is
     meaningful. So "meaningless by definition" just implies the definition is
     wrong.

     >   Example: if it's %eth0 and it happens to arrive in a remote host on the
     >   interface known there as %eth99

     To be clear: whether it's on a server or a client, the address is always
     the server from the client's perspective and interface specifier always
     contains the client's interface name. The server's interface name is never
     sent over the wire.

     The HTTP client does not accept a response coming from another interface
     than the one on which it made the request.

     (If somehow the ongoing TCP connection could be migrated between
     interfaces, even then, the link would merely be stale, not meaningless.)

     >   [I've written code dealing with IPv6 link-locals and it's quite a lot
     >   of bother to keep track of interface identifiers for incoming messages,
     >   but one thing you never, never rely on is the interface identifier used
     >   in the remote host.]

     The entire principle of HTTP is to rely on the remote host to tell you
     where to go.

     The server can send you to any IP it chooses.

     The server knows how to send back the same string that the client sent --
     when it wants to keep the client on the same link.

     The server chooses whether to send back the string you sent it. It can send
     any other string.

     There is no need or benefit on the client to censor the string and not
     supply the server with the exact name it actually used to access the
     server.

     The principle of HTTP1.1 is that the client tells the server its own name
     for the session. The name of the server on the server is only supposed to
     be meaningful from the perspective of the client. For example, the server
     could be given a name for itself that is a different IP than it's listening
     on from a network perspective, and that name would be the IP address that
     the client uses, from its perspective, to reach the server.

     If the client redirects IP 127.0.0.2 on their own machine to a google IP,
     and visits that way, then the google web server will be told that its name
     is "127.0.0.2" and google will be able to send that IP back to clients if
     they choose. And the user will be able to actually connect to google
     through that meaningless (to google) IP.

     Meaningfulness here, on both client and server, is defined from the
     client's perspective. So it's fine. It's meaningful.