Re: [saag] i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))

Nico Williams <> Tue, 13 January 2015 22:36 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 8D1791A9068; Tue, 13 Jan 2015 14:36:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id U4yzWJOGXlVW; Tue, 13 Jan 2015 14:36:32 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 1663D1B2A1B; Tue, 13 Jan 2015 14:36:32 -0800 (PST)
Received: from (localhost []) by (Postfix) with ESMTP id CC41E58406E; Tue, 13 Jan 2015 14:36:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to;; bh=Bf8kBY4AQamY2c mLJ2ueO/4VdG4=; b=nDcEQ0MVkhYnZtHMb/TLI+OCXjGHktRM8JVNgW67DvQn+u Vcg1VU13TMGXeRjgE/e4PN8zCMi52sC71tzrlPgJINFDi17+wycsqrWVPY5n6s6l jQFhWoAfkODkRfXxUzae/AVMhWQFB9fr2XG92aimlY2DwyB1cOd9TWnJEHsjg=
Received: from localhost ( []) (Authenticated sender: by (Postfix) with ESMTPA id 3994F584065; Tue, 13 Jan 2015 14:36:31 -0800 (PST)
Date: Tue, 13 Jan 2015 16:36:30 -0600
From: Nico Williams <>
To: John C Klensin <>
Subject: Re: [saag] i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))
Message-ID: <20150113223619.GD16323@localhost>
References: <> <> <20150112045411.GD16323@localhost> <> <20150113000854.GW16323@localhost> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <>
Cc:, Peter Gutmann <>, "Salz, Rich" <>, Pete Resnick <>,, Jan Pechanec <>
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 13 Jan 2015 22:36:33 -0000

On Mon, Jan 12, 2015 at 10:41:01PM -0500, John C Klensin wrote:
> --On Monday, January 12, 2015 18:08 -0600 Nico Williams
> <> wrote:
> > Well alright.  I'd love to see a set of guidelines for I18N
> > activities.
> So would we all.  RFC 2277 was supposed to provide some guidance
> but is now badly obsolete in many different ways, including
> exhibiting how little we knew about some things at the time.  We
> have, I hope, learned a lot, but see below.
> > When should we try to support Unicode, and when should we not?
> > Is it one of those "I know it when I see it" kinds of
> > guidelines?  That wouldn't be useful enough :(
> Let me suggest a general way of thinking about things -- maybe
> not quite a "guideline".  Especially for security-type
> protocols, make sure there is a substantive reason, presumably
> connected to users and user experience, for it to be necessary
> to go beyond ASCII.  I really do mean "necessary": if it is just
> a good idea in principle or a maybe-nice-to-have or "maybe
> someone will want this some day", skip it because adding i18n
> capabilities _will_ make correct and predictable implementations
> more difficult and _will_ increase the number and range of
> attack opportunities.   

Yes, I18N is all about UIs and the UX.

Clearly, if a character string isn't a UI element, and is never a
visible aspect of the UX, then it is a great candidate for being made
US-ASCII only.  Indeed, we *should* make all such strings US-ASCII only.

That much is obvious, and whether or not something is part of the UI is
an objective measure with relatively little room for doubt.

But there are UI elements that could reasonably be constrained to
US-ASCII (because the world over, people manage to deal with US-ASCII
character strings in various parts of their UIs).  The tricky part is
deciding what UI elements (or things leaking into them) qualify.

For example, a "manufacturer" name in PKCS#11 could reasonably be
constrained to US-ASCII only.  Right?  Well, maybe a French -say-
manufacturer might object.

An interesting distinction here might be: name or identifier?
Identifiers (appearing in UIs) -> US-ASCII.  Names -> Unicode.

Token and object labels seem a lot like identifiers in the use cases I
expect.  But I can't be certain that they would never be expected to
contain names.

Manufacturer names really are names, no?

These are decisions that we can make that can anger people who are not
participating here today.

> > Mind you, IIRC PKCS#11 didn't even say anything about ASCII
> > before. Token labels and such used to be fixed-sized octet
> > strings containing character data.  Jan can correct me if I'm
> > wrong.  I'm not sure even saying "ASCII-only" would
> > necessarily be safe in that case...
> And that reinforces my view that the real, underlying, problem
> here has to be fixed in PKCS#11, not in anything the IETF puts
> on top of it.  Only they can fix the problems; we can, at best,
> mitigate the damage.


But look, PKCS#11 is a thing with a low count of character strings.
Mostly things will be looked for with equivalence semantics, and
form-insensitive Unicode string comparison will do for that (at the
expense of having the code for it), as will plain old octet string
comparison (because we can expect happy input method output form
agreement accidents).

I think Jan's text is fine.  I don't mean to belabor this thread.
I'm now only commenting on the more general matter of when we should be
happy to settle for less than the full I18N treatment.

> > Fortunately the OASIS PKCS11 TC has clarified that these are
> > UTF-8; unfortunately they left other I18N details out.
> It appears to me that what they have said puts their level of
> understanding of the various issues somewhat behind where we
> were when RFC 2277 was written in 1997.  

Yes, but it's also fair to note the above, that this is the sort of case
where a low-effort I18N ("say UTF-8; say nothing about anything else")
seems likely to be good enough for most implementors and users.