Re: [I18nrp] Conservatism principle doesn't go far enough

Nico Williams <nico@cryptonector.com> Mon, 04 February 2019 20:52 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8AF00130F29 for <i18nrp@ietfa.amsl.com>; Mon, 4 Feb 2019 12:52:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 94wSnLK8gCtj for <i18nrp@ietfa.amsl.com>; Mon, 4 Feb 2019 12:52:01 -0800 (PST)
Received: from insect.birch.relay.mailchannels.net (insect.birch.relay.mailchannels.net [23.83.209.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 26546130F26 for <i18nrp@ietf.org>; Mon, 4 Feb 2019 12:52:01 -0800 (PST)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 8882C6826EF; Mon, 4 Feb 2019 20:51:55 +0000 (UTC)
Received: from pdx1-sub0-mail-a29.g.dreamhost.com (unknown [100.96.26.166]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 3FE086835DE; Mon, 4 Feb 2019 20:51:55 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a29.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.16.2); Mon, 04 Feb 2019 20:51:55 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Snatch-Lyrical: 2864b8d72260849b_1549313515395_1954506763
X-MC-Loop-Signature: 1549313515395:3751455316
X-MC-Ingress-Time: 1549313515395
Received: from pdx1-sub0-mail-a29.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a29.g.dreamhost.com (Postfix) with ESMTP id 0287A80068; Mon, 4 Feb 2019 12:51:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=ZEaqkcTIru++RnoteDXG84NgOHY=; b=FqFt1j4KV16 vLfw8R0jaHY8ue2xQAIR0aE8hgGOyPx5d/iKIwqnGwOZqWqzryxWPviuy4gGpfeD VRnvVRmH+uWw2xLgs+5HN3iaYLqRkz6nvMLjTtLKaoHSvb1KxZ0D/IJU1pxOOe7H 72Pd6oE3cOPpsbTt0vpYEg9j3UirE9a4=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a29.g.dreamhost.com (Postfix) with ESMTPSA id C79F080065; Mon, 4 Feb 2019 12:51:52 -0800 (PST)
Date: Mon, 04 Feb 2019 14:51:50 -0600
X-DH-BACKEND: pdx1-sub0-mail-a29
From: Nico Williams <nico@cryptonector.com>
To: Larry Masinter <LMM@acm.org>
Cc: "'Asmus Freytag (c)'" <asmusf@ix.netcom.com>, i18nrp@ietf.org
Message-ID: <20190204205149.GD4108@localhost>
References: <20190201021802.A5160200D93BBA@ary.qy> <4C0F3C8D65FB57C697E72F8D@PSB> <016001d4bb75$15350130$3f9f0390$@acm.org> <a956b63b-cff0-5df3-b7fc-511274542349@ix.netcom.com> <20190203234846.GA4108@localhost> <1c176e53-2f27-ca83-7e59-52099021ddcd@ix.netcom.com> <021d01d4bcc5$a4d7c3d0$ee874b70$@acm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <021d01d4bcc5$a4d7c3d0$ee874b70$@acm.org>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedtledrkeeggddugeefucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepfffhvffukfhfgggtugfgjggfsehtkeertddtreejnecuhfhrohhmpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqnecuffhomhgrihhnpehmfegrrgifghdrohhrghdpghhoohhglhgvshhouhhrtggvrdgtohhmnecukfhppedvgedrvdekrddutdekrddukeefnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehlohgtrghlhhhoshhtpdhinhgvthepvdegrddvkedruddtkedrudekfedprhgvthhurhhnqdhprghthheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqedpmhgrihhlfhhrohhmpehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmpdhnrhgtphhtthhopehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmnecuvehluhhsthgvrhfuihiivgeptd
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/QDo-mlQidwmqHlVuhkxDSutGIzQ>
Subject: Re: [I18nrp] Conservatism principle doesn't go far enough
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Feb 2019 20:52:05 -0000

On Mon, Feb 04, 2019 at 12:10:15PM -0800, Larry Masinter wrote:
> Let me be a little more careful. The document 

Thanks.

> https://chromium.googlesource.com/chromium/src/+/master/docs/security/url_display_guidelines/url_display_guidelines.md

| In general, the web platform does not have limits on the length of
| URLs (although 2^31 is a common limit). Chrome limits URLs to a
| maximum length of 2MB for practical reasons and to avoid causing
| denial-of-service problems in inter-process communication.

Ha.  I've seen stacks that fail to handle more than 4KB or 8KB request
header sizes including the start-line.

| On most platforms, Chrome’s omnibox limits URL display to 32kB
| (kMaxURLDisplayChars) although a 1kB limit is used on VR platforms.

I'm not sure how I could understand a 32KB URI in a status bar.  Or even
1KB.  And we're not talking about my parents here.

That document seems reasonable otherwise :)

> seems to be primarily about one use case:  The user has somehow
> navigated to a site (through links, typing in, search, whatever), and

I think we really have just these cases:

 - the user followed a link
 - the user pasted a link
 - the user typed something in exactly
 - the user typed something in and selected a partial match from a
   search engine

The specific case matters because if the user types it in, the UA must
normalize and so on, but if it's followed then...

If the user followed or pasted a link, then that needs to be sanitized
to some degree, and if it's malicious, then simply fail to follow --
don't just follow and attempt to indicate the maliciousness through the
status bar.

If the user picked an incremental result while typing... we should treat
that a lot like following a link (even if we trust the search engine).

Following a link in a page of search results is still following a link,
so search in any context other than typing in the status bar is not
special.

> is now looking at a web page; what should the UA show (in the address
> bar or any other security context) to validate “who is this coming
> from? Which site sent me this data? It says it’s my bank, did it come
> from them?”

The original sin here is the username:password@ optional prefix to the
authority component of the URI: it confuses many users (probably most).

So we should start by removing the username:password from the status /
origin bar!  (The Chromium doc says as much.)

If the URI was typed in, prep it as usual and use it.

If a link was followed, no prep is needed, but perhaps we should apply
it anyways since it's idempotent, applying ToUnicode() and ToASCII()
again to normalize.

The linked doc's proposal for how to handle bi-di scripts and "sneaky"
codepoints and characters is a good start.

It would be useful to know what scripts the user can read, that way the
UA can be a bit more dramatic about URIs whose authorities contain
characters from scripts foreign to the user.

Zalgo detection would be nice :)

> The spec is only about the visual display and doesn’t cover copy/paste
> or save-as-bookmark or other operations which work with a different
> form.

Copy/paste seems no different that following a link -- the UA won't know
the origin of the link, but we must not assume the user typed it in
somewhere.

> Insofar as
> 
> *	normalization doesn’t change the visual display (?)

Normalization indeed does not change the display.  It cannot.  That's
part of the definition of Unicode equivalence forms.

(I suppose there may be implementations where form artifacts leak to the
rendering, but that would be a bug.)

> *	The URL that was actually used for DNS was normalized before use anyway
> 
> then at worst the advice to normalize is harmless. So I take back my
> critique.

It is harmless to normalize again and again.  Normalization is
idempotent.

> https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-02.pdf
> 
> is about different use cases, with similar but not identical advice
> and requirements.
> 
> The original point, though, stands: the “Conservatism” principle
> doesn’t go far enough in warning those registering domains to avoid
> those labels which downstream processors will reject or display
> poorly.

I mean, a registrant can test how browsers would handle their domains.

But browsers can change how they handle the registrant's domains...

> Is it possible to get any statistics on DNS requests that include
> unnormalized strings? And some hits about where they come from?

A-labels that decode into non-canonical Unicode are busted.  I have no
numbers about them.

Nico
--