[Cellar] Re: Andy Newton's Discuss on draft-ietf-cellar-tags-20: (with DISCUSS and COMMENT)

Andy Newton <andy@hxr.us> Mon, 12 January 2026 20:08 UTC

Return-Path: <andy@hxr.us>
X-Original-To: cellar@mail2.ietf.org
Delivered-To: cellar@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 959B2A6A2510 for <cellar@mail2.ietf.org>; Mon, 12 Jan 2026 12:08:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=hxr-us.20230601.gappssmtp.com
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yassqe1ifhcO for <cellar@mail2.ietf.org>; Mon, 12 Jan 2026 12:08:08 -0800 (PST)
Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 284AFA6A2507 for <cellar@ietf.org>; Mon, 12 Jan 2026 12:08:08 -0800 (PST)
Received: by mail-qk1-x733.google.com with SMTP id af79cd13be357-8b23b6d9f11so731241785a.3 for <cellar@ietf.org>; Mon, 12 Jan 2026 12:08:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hxr-us.20230601.gappssmtp.com; s=20230601; t=1768248482; x=1768853282; darn=ietf.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ETQXCihsp8SZ9gELeXcZXfIRllfVqsSjBeqdUGv0FZY=; b=DfAvNK48PajhhB9rpcGYouoqJEUrSqx0bEui3oKhukMivTdLweMfxmBZKOlkXBYheX Sb4dRaf/nRPJGOrdF1FWIEEYwJOPkgGURvEUJJYg1FplKKKv+5NtB6B9fLSjK3assR8W AC4kIGnbZf8jukMIFYlkBsT9PulbTt1g/KMccemBPp6InIPtSUzdKNw/arCnQaSR1tgI 29TD1gFFioAKoVMVoSqUAI95XuA3xCJ4cYvgTeK1ORLS3SmsBDWKbUXNDWeSlEWXVPdH 6VIQWPSlI/LUIGteQYHFHB6JdPGEk9u14a86qPVXJ3Ylk1hieNjlU3J7U5/2EyBjRUCg TV6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768248482; x=1768853282; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ETQXCihsp8SZ9gELeXcZXfIRllfVqsSjBeqdUGv0FZY=; b=s2DmTPLnq5QcYZh7BHFzdZV3YncVSs+1NgXt4lfhRMuuAzJOhXgCoCeFPRyUpvIQFO 74EM/P+QNfxF+BHfAZXyAFfe4YalEB976axe5CPOk9oUktykc+kloav2g2H/tyfu9bmb l0+cD7pGm4EFwUpmUrZFQe8gocMv/ciVDS6R0ObD0qP9N5DwJ3WO87Nx4a5zkRT5UaDk MTOVsVc6zsfbUEjcbbOa9YfRF27o6nvMPbvJVTSwuFowaOIyKVwlG65iP953AQ3iujQy fWla/s/WkMdgjMV1AKkHMy5IJODsGNed/Yh7/mWRKDDrYNb6ESlJFpU/Ux0NjwO4J3Ny MGlw==
X-Forwarded-Encrypted: i=1; AJvYcCWc+8XLbaG9RGAchxHwP0QBemjn8Y0N78Mf5ymmA8DottG6M5v4AKWxboJu/UNdfIPOGT3nnog=@ietf.org
X-Gm-Message-State: AOJu0YwCToxfXEZmL3scCmfyABDBgM0qCrG4sAQoVxJxaDxdesrY6r8x k1sURC/j202JGXi+izKOaiHNcfvgFyXlad9NqcxgNCtFVXQ9veA1UpcOVABdrGIElzU=
X-Gm-Gg: AY/fxX46IZN/zXmWCidbRqKvEImz5Km8Iq/tcz9cptEkz/C1fNq1ebWDTHeFIRc9HwL aGdvIyCB0za0lxi74ebg7o0h/SLTULxUbsjYYnd9HdIqODeDLgdM/puQ2WgaFJQUkUfgjhGBtll oEHfQo7YAlEbI3kpeQeC00gkxOXqFczrRtpqIOdM8ruvALZs27UmsdhMU99egq+pLpfHSI/EIeX Bijgp4GIxsX+tVVtNyWd/5XB9TmYqiuNF71qmchP2aB+de6rSrm4KBh1TugidJXtvVABd41i1r6 d7RJiFA2dvX7yJ4qYme//UGrZ/buoGY5XDI7IQHhgxlgQxaDig+9eGwj/DSQowjKHaCa9pWWVgw cp+VYH9NvUtXVsNJD530QguyPh4VL4Qp67qU7wNBZ/SeTkutAuydO23YsIMaaPhweNxypdskzl6 eqGB1InHbswFzhTY65+vx9TC0DXwNv+bR8ZKjtxECroZ9Eln881AM=
X-Google-Smtp-Source: AGHT+IHmLPUR1vpxqzwpCOhm/sItf+Ty3jrCq+7xaGm+t7zHAVWeHKyADElE0CjZhIRIF6BvfyMu9g==
X-Received: by 2002:a05:620a:4804:b0:8b2:db27:4262 with SMTP id af79cd13be357-8c389416debmr2718886685a.74.1768248482104; Mon, 12 Jan 2026 12:08:02 -0800 (PST)
Received: from ?IPV6:2600:4040:248d:7b00:f214:25f3:2872:b90a? ([2600:4040:248d:7b00:f214:25f3:2872:b90a]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8c37f4a63acsm1568969085a.6.2026.01.12.12.08.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Jan 2026 12:08:01 -0800 (PST)
Message-ID: <f615485f-3e99-4e6f-b070-22aaaabcfd7f@hxr.us>
Date: Mon, 12 Jan 2026 15:08:01 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Steve Lhomme <slhomme@matroska.org>
References: <176772159265.3617026.15718053658319749010@dt-datatracker-5656579b89-p6k4r> <399A5709-05A1-4160-B89B-13ECE136819E@matroska.org>
Content-Language: en-US
From: Andy Newton <andy@hxr.us>
In-Reply-To: <399A5709-05A1-4160-B89B-13ECE136819E@matroska.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Message-ID-Hash: UGFOOVQ2BVBJOQTX3Z6DUG3WKCIRIVEX
X-Message-ID-Hash: UGFOOVQ2BVBJOQTX3Z6DUG3WKCIRIVEX
X-MailFrom: andy@hxr.us
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cellar.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: The IESG <iesg@ietf.org>, cellar-chairs@ietf.org, cellar@ietf.org, draft-ietf-cellar-tags@ietf.org, spencerdawkins.ietf@gmail.com
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cellar] Re: Andy Newton's Discuss on draft-ietf-cellar-tags-20: (with DISCUSS and COMMENT)
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/Igw1k4kk1JpxJ1uh7EeJ-jXiMYI>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Owner: <mailto:cellar-owner@ietf.org>
List-Post: <mailto:cellar@ietf.org>
List-Subscribe: <mailto:cellar-join@ietf.org>
List-Unsubscribe: <mailto:cellar-leave@ietf.org>


On 1/11/26 7:20 AM, Steve Lhomme wrote:
>> ### UTF-8 and Problematic Code Points.
>>
>> 196        Official TagName values MUST consist of UTF-8 capital letters,
>> 197        numbers and the underscore character '_'.
>>
>> In addition to Roman's DISCUSS on capitalized UTF-8, should the tags be
>> limited to exclude the problematic code points as described in RFC 9839?
> 
> Because there’s no proper definition of capital letters, I’m more inclined to limit to the latin capital letters. That basically turns the TagName element into an ASCII (“string" type) element rather than UTF-8 element, without breaking backward compatibility. https://www.rfc-editor.org/rfc/rfc9559#name-tagname-element
> 
> As for the problematic code points, it seems that it’s rather a more general problem for all Matroska UTF-8 elements or even EBML in general.

Ok. You sidestep the problematic code points if you limit it to upper-case latin.

> 
>> And from your Security Considerations sections:
>>
>> 1671       Most of the time strings are kept as-is and don't pose a security
>> 1672       issue, apart from invalid UTF-8 values.  Implementations MUST
>> 1673       validate TagString inputs for UTF-8 correctness and reasonable length
>> 1674       before use, in accordance with the security considerations in
>> 1675       Section 10 of [RFC3629].
>>
>> I think you have to apply RFC 9839 to make a statement that UTF-8 values
>> don't apply a security risk.
> 
> Not being a UTF-8 or security experts, I’d rather stick with the Security Considerations of the  RFC that defines UTF-8. 

The problem is that you say the strings "don't pose a security risk", except 9839 discusses why they can. IMO, you can't make that claim without accounting for problematic code points.

> 
>> ### Beginning Underscore
>>
>> 204        It is RECOMMENDED that tag names start with the underscore character
>> 205        '_' for non official tags that are not meant to be added to the list
>> 206        of official tags.
>>
>> Why is the RECOMMENDED? Can it be a MUST? If it is RECOMMENDED, what are the
>> ramifications of not following the recommendation?
> 
> Because there are tons of files out there with custom tag names (I mentioned FFmpeg and mkvtoolnix in another review) that don’t follow this rule. This rule has been added more recently so we have an official way to define tags that are private.

Ok. That makes sense. Thanks.

> 
>> ### TagString Formatting
>>
>> 215        Multiple items SHOULD NOT be stored as a list in a single TagString.
>> 216        If there is more than one tag value with the same name to be stored,
>> 217        it is RECOMMENDED to use separate SimpleTags with that name for each
>> 218        value.
>>
>> Can this be a MUST NOT? Why allow them to be stored as a list at all. And if
>> this advice is not followed, what happens? Is it that some software won't
>> interoperate? If that is the case, I think you are better off stating this as a
>> MUST NOT.
> 
> As mentioned in Éric’s review:
> This used to be that way but turned back in https://github.com/ietf-wg-cellar/matroska-specification/pull/1030 <https://github.com/ietf-wg-cellar/matroska-specification/pull/1030>
> After the AD review from Orie https://mailarchive.ietf.org/arch/msg/cellar/4ebLFttRb_I8SFu5yMQSIDPuk2E/ <https://mailarchive.ietf.org/arch/msg/cellar/4ebLFttRb_I8SFu5yMQSIDPuk2E/> 
> 
> Especially since INSTRUMENTS and KEYWORDS suggest they can group multiple values separated by a comma.

Ok. If this was previously discussed then I won't belabor it. 

> 
> 
>> 220        Preexisting files may have used multiple values in the same TagString
>> 221        but given there is no defined delimiters they cannot be easily split
>> 222        into multiple elements.  INSTRUMENTS (Section 4.4) and KEYWORDS
>> 223        (Section 4.6) tags allow using a comma as a separator.  However, it
>> 224        is RECOMMENDED to use separate SimpleTags with each containing a
>> 225        single instrument or keyword value, respectively.
>>
>> Why are separate tags only RECOMMENDED? Can this be a MUST as well? If it is to
>> remain as a RECOMMENDED, what are the consequences of not following the
>> recommendation?
> 
> Precisely because of preexisting files. We cannot make them invalid just because now we want stricter rules.
> 
> There is no consequence to grouping or not grouping elements. Matroska Readers should be able to group values (or keep them split) in any case as it has always been an option. As with a database it’s better if you use atomic values that can be referenced multiple times.

I see this was also part of the previously discussed item as well. Again, let's go with the current resolution.

-andy