Re: UUID version 6 proposal, initial feedback

Brad Peabody <bradgareth@gmail.com> Tue, 04 February 2020 10:04 UTC

Return-Path: <bradgareth@gmail.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E21712006D for <ietf@ietfa.amsl.com>; Tue, 4 Feb 2020 02:04:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DhkapzuwhSud for <ietf@ietfa.amsl.com>; Tue, 4 Feb 2020 02:04:56 -0800 (PST)
Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 96DB112004F for <ietf@ietf.org>; Tue, 4 Feb 2020 02:04:56 -0800 (PST)
Received: by mail-pg1-x534.google.com with SMTP id g3so3698939pgs.11 for <ietf@ietf.org>; Tue, 04 Feb 2020 02:04:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=eU+nIM/M8umamUxI+gwL6F/sJO+MAWSLQhQdQ7sqNQw=; b=hDnZxd+J/cDQzRYs0zrCGOteZqsoEb8um2oQCp+r/xqOWe77HlbGYI8aXQjBuUc1PJ kMPMj2SCVgGLvXRESCrRwH3wbWkNJjN5OY9j4gHZbm/CBeYyXcA0wjfBv6veBJPOxALi /TD/oua9kseIl3yx88G6sp+2SVvGMX/3QlYqLAHBf/1LLRpFwanhjyKUFZovK30dtPIG oDXr9g891UOWs4tRHg7p+loER7ns62xSLnU25qNtCaNpgAEkQNslBPfExmLaGJxRx0iI UQ6tTqf6YCezC6caHhL9bAg9HDpgVTUUktI4qCP/tnlwC8NGjf0hhABqyW4MOzFdfxXg qdAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=eU+nIM/M8umamUxI+gwL6F/sJO+MAWSLQhQdQ7sqNQw=; b=ZkO5dNygjOnZBHaPPyb6c6lbksgpjWpgtgZrzUjqyEXs6i4ByfV1nmcBDVj5/bFKzV nZIiyRzLy1aioGakmA9lTHf3lGUHFuXqKiy0jkZeKi/t0xW4octT+2cwSuNUK72WyLZK 1yJ370bQTvBYzNb/Z/x68rkx/z1btmf2aw9xHtEr0GMRCojRLxd/EI1mtMAmQuQ2vT1f E4bsBcL0waWAVJygSwVfZehOKkzDop8H8xAG7Dsy3JLqy6vxO6JK3Z16OGkah8ERFcsr ebFysejxbjDyixcKIrfS+iRDYoVt2WPY5YI27Ve5QGNFyY5QDYnbpPRMqTuMxecI7NOm v09A==
X-Gm-Message-State: APjAAAV0/eSfb3uvpDuvD6L+KoAMiXUfu0gIr7UA3D+xH5u14czMkY4e Pz0PYkh4wpL0qlefa0mqW8A6E7v+
X-Google-Smtp-Source: APXvYqxBYyXi9SUnDxwiRjaCfY38HTfL+6N89AR3+At6/zNgWmqEksg2n7HNwZ8qGM/TGPX/hcWl5w==
X-Received: by 2002:a63:ca04:: with SMTP id n4mr9197044pgi.110.1580810695652; Tue, 04 Feb 2020 02:04:55 -0800 (PST)
Received: from BGPMacBookPro.charter.com ([2600:6c50:7f:5954:b980:9a61:f711:906b]) by smtp.gmail.com with ESMTPSA id t186sm4318153pgd.26.2020.02.04.02.04.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Feb 2020 02:04:54 -0800 (PST)
Subject: Re: UUID version 6 proposal, initial feedback
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: IETF discussion list <ietf@ietf.org>
References: <D0894516-3F20-4545-BD7D-BE4FA96FAF75@gmail.com> <CABkgnnXSxqqinyK4QiwVv-VuzAraHFUGCrm0K0e9dJX_F80bWg@mail.gmail.com> <D3517A2C-1FCC-42D2-9AB6-248680BE89E1@gmail.com> <c5ba6f5d-7c61-bfdf-63e6-be7d640ee50c@gmail.com> <6E165220-7D1F-4AD8-B4F3-DDCB8F1DA6E2@akamai.com> <b4b73e11-7e21-03ae-0ebf-badcc2bf9d7e@gmail.com> <20200201060733.GD454818@mit.edu> <75dff0d7-3e2b-8f2a-c8b1-46d27004cce3@gmail.com> <20200201212859.GB528198@mit.edu>
From: Brad Peabody <bradgareth@gmail.com>
Message-ID: <62568ffb-8e95-f331-7790-3ff0f496de87@gmail.com>
Date: Tue, 04 Feb 2020 02:04:52 -0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.4.2
MIME-Version: 1.0
In-Reply-To: <20200201212859.GB528198@mit.edu>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/RMFxvqmJ76meVkT7ILniszpFK-4>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2020 10:04:58 -0000

> For very specialized use cases (e.g., where 16 binary bytes is too
> long), I'm not sure it makes sense to create a fully general
> specification which is so abstract that it becomes hard to use, since
> you have to customize it for a particular application.
>
> Past a certain point, you might as well say that you want to
> standardize "b-trees" --- and that's something which belongs in an
> algorithms book, and not an I-D or an RFC.
>
In practical terms, the issue I'm concerned about is that there are 
various points where software from different projects needs to 
communicate UUIDs to each other and so it turns from a general thing 
into something very specific. In this link I mentioned earlier 
https://github.com/uuidjs/uuid/issues/303#issuecomment-575992079 it's 
all about how UUIDs work with Postgres as primary keys.

The text encoding is particularly relevant, since hexes with dashes is 
not a particularly practical format (too long) for a many use cases.  So 
if you have a column in a database of type 'UUID' (or some other similar 
ID type) and then issue an update statement - there needs to be known 
text format and it would be great if it were more compact.   And if it 
were standardized then eventually various databases would implement it.  
The sorting is also a thing - see below.

But I do get what you mean on shorter IDs with less uniqueness guarantee 
- a spec isn't need to just generate a random opaque string that you're 
just going to have to check a database for to ensure you don't have a 
duplicate.

> Again, I'd urge you to consider that you should build on *top* of the
> standard UUID spec, since there are already implementations that will
> do the right thing as far as time-based and random-based UUID's (the
> latter will provide all the unguessability you need) and then just
> create a library which *transforms* the UUID into a convenient
> encoding form that makes it be convenient for key-indexing, or a more
> compact text encoding, etc.
>
> There are already very good implementations for UUID generation, and
> if you create something which re-invents the wheel at a spec level, it
> will cause people to reinvent the wheel (possibly badly) at the
> implementation level.

I think a part of the issue here is that since there is an existing UUID 
spec people then try to use it for things like database keys, and then 
run into these various rough edges:

- hex text format too long

- sorting is extra work for no benefit, having the raw bytes sort 
naturally in time sequence (for UUIDs with a timestamp) would seem to be 
a better way

- there is no official form of UUID that has a timestamp and then random 
data - the spec says to use the MAC address of the machine (version 1) 
which can be a security issue; and version 4 is not time-ordered.

So that's more info on the motivation and, at least from my perspective, 
the practical concerns that are driving this.

It would certainly be less of a change to just propose an update to the 
UUID spec that puts the date in a sequence that sorts as raw bytes and 
says feel free to use random data from a properly seeded CSPRNG instead 
of a MAC address if you're willing to have X collision probability in 
exchange for not having your MAC address there.  And also says btw hex 
encoding is big, so feel free also to use base32 or base64 according to 
what you want for your app. It would be great if these things had good 
names too so that when someone goes to implement this in a new database 
column type they can be referred to with sane names that match the 
spec.  Those points might be all it would take. And that was more or 
less my original intention with coming up with the original "UUID 
version 6" concept.

At the end of the day I guess the main driving use case is as a database 
primary key (and then also being able to easily use that PK in the same 
human readable form in things like URLs or written in documents), but 
the above sillyness ("rough edges") make this difficult with the current 
specification.