[Cellar] Re: Roman Danyliw's Discuss on draft-ietf-cellar-tags-20: (with DISCUSS and COMMENT)

Steve Lhomme <slhomme@matroska.org> Sun, 25 January 2026 10:40 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@mail2.ietf.org
Delivered-To: cellar@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 1676CACAA899 for <cellar@mail2.ietf.org>; Sun, 25 Jan 2026 02:40:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20230601.gappssmtp.com
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n3Cn_AxbP8iI for <cellar@mail2.ietf.org>; Sun, 25 Jan 2026 02:40:01 -0800 (PST)
Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id A3A50ACAA880 for <cellar@ietf.org>; Sun, 25 Jan 2026 02:40:01 -0800 (PST)
Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-47ee66dab14so3042795e9.3 for <cellar@ietf.org>; Sun, 25 Jan 2026 02:40:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20230601.gappssmtp.com; s=20230601; t=1769337595; x=1769942395; darn=ietf.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=uYaFavvZ4XrZ0shXGPyLqD0i8IPOOezmkbecFr/GUHM=; b=SevPBNX3bI1b24LpGgVatw7vhEgKtEuV5zZn9a7+TOs8bXldXHJRXfBH5JHN8xc8PP 8dXXFzetWu9QLd93emOY8puhvrv2zlxPN4EArnmkLx4dXp2wkfpLZSjJEHUK47xI0j8P AVkjJnP2HxdOkI1K6GWZRzXypu4pgKey2EypRsbZMljm9Qilg/yNC4J8d6Jw8B3cb90E TZ9iNqLjw/GorLHXUcsQobM+05xx+fIN4PNJecybkBMP8DZrOsOidjv1Z2lWH6YNslqe nd9bTDcSjUYeLMbcNbBtdKGvDozv/2fAL2yQWFKmwUaddkfyiS6uFNy20DpzfjaIXaQf sRiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769337595; x=1769942395; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uYaFavvZ4XrZ0shXGPyLqD0i8IPOOezmkbecFr/GUHM=; b=VgKfJt4gmRj5or1TIExLPDOTEYudOu2DcNPboYFLktJH1cYa8eD4spYCNMuPabwmZE KZX/z/yFdPO+8QsvhfWnKHURW9LmPvNfE/gzCV8UqegoUUbLbwOySMoeX/DfFqDYpLq+ p+Gi6M09MV/EldM85GsFn6C++aHBV8a4EMSDshDJV4MLWLcq8UUcnUz9Tue7t9E6phFk WPt+jr6H71BPab1iOWXtbHmiwlFr7MniSZshZQ/IB23fpW20mXUV/SSzWfzKQbChB3YT n14q8HWsY6XpoHUVyRt2OBidK9fdQGiid3Gu9Xuod9coouyMBHiL3/eJduCWoL/gfXKv zP3Q==
X-Forwarded-Encrypted: i=1; AJvYcCXRowEfb3/M+zh+IIQtXklCliEY/+fD3Gm0xVyT8+GZ91qt0nyxC80MI+lwXBw2GIyDVQMnaZM=@ietf.org
X-Gm-Message-State: AOJu0YxRQjT9O/HO2WMj09kfcNvEPh2LbnNUW+AHlgemJFmmZpwsx2ez Ns4ZDOXsBNVP/qm1sAkT0Yd1BKzyYpEKUxSS+h3b8bLzLL0dqojxEnFUnFe8WrIX8A==
X-Gm-Gg: AZuq6aKz1ci57ov2a8cysKjQBnj1XCRUPlY4hJRTMpF+RK86XxBNbk2634EryvnMli+ 7jqmvlmUaHEfykxrmtc4iT449BiZCSfjneUuL1tHB6BfK+u/CIqYE7Pgg3g9XpRCkEftk+T/RTN zUTkFSKDD+4YQ3V4KB4xusV/95g7gPLu2Aah1H3vq5bhiM2YJ3HZfoQingi5T+yMLWA0pfjhTn6 BLix9vf50R9ncjfomiUj2USJKkPTS1JzRSjDbsd/ceiI2S9gcGdTbLBgiSV19ke67U87svJWTB6 +OnTcDaKr+cOOFM7CrsCbBLysEyi/tffNaQQ3TJu1efekQIPU0lpe23urhq0efN08Z5EbnPNmmz AZ4Hh0sX+On2dHrxg9zNo32MUCCw/pdvGsCxZJu9c/RVsGLCcj2i0AivOrpeJe7mQNeraprnean uDzxbpBJR4d0oqv/gg7aOledSsH/hMi7+0X4A=
X-Received: by 2002:a5d:5f54:0:b0:435:c5a1:5b36 with SMTP id ffacd0b85a97d-435ca1331aemr1212838f8f.3.1769337594011; Sun, 25 Jan 2026 02:39:54 -0800 (PST)
Received: from smtpclient.apple ([2001:861:34c4:290:d1be:1187:540b:bbf]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-435b1c02cf6sm21972058f8f.7.2026.01.25.02.39.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Jan 2026 02:39:53 -0800 (PST)
From: Steve Lhomme <slhomme@matroska.org>
Message-Id: <7EF5D742-BD20-419E-B3E6-892238C37B8E@matroska.org>
Content-Type: multipart/alternative; boundary="Apple-Mail=_AAFA56D5-E2FC-43EE-9AA6-B921D807A229"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81.1.4\))
Date: Sun, 25 Jan 2026 11:39:42 +0100
In-Reply-To: <4FEC9D00-AF08-4424-9323-8CCFF1C33C1D@matroska.org>
To: Roman Danyliw <rdd@cert.org>
References: <176763438366.3302121.11265797912471711436@dt-datatracker-5656579b89-p6k4r> <4FEC9D00-AF08-4424-9323-8CCFF1C33C1D@matroska.org>
X-Mailer: Apple Mail (2.3826.700.81.1.4)
Message-ID-Hash: 5YHGU2DN6XEFXUZ5VLRNYAI6P3KNMUYT
X-Message-ID-Hash: 5YHGU2DN6XEFXUZ5VLRNYAI6P3KNMUYT
X-MailFrom: slhomme@matroska.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cellar.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: The IESG <iesg@ietf.org>, cellar-chairs@ietf.org, cellar@ietf.org, draft-ietf-cellar-tags@ietf.org, spencerdawkins.ietf@gmail.com
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cellar] Re: Roman Danyliw's Discuss on draft-ietf-cellar-tags-20: (with DISCUSS and COMMENT)
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/iZZ9iIz6YWIyJ1d60EM8rT6vDLE>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Owner: <mailto:cellar-owner@ietf.org>
List-Post: <mailto:cellar@ietf.org>
List-Subscribe: <mailto:cellar-join@ietf.org>
List-Unsubscribe: <mailto:cellar-leave@ietf.org>

Hi Roman and everyone,

I added a new Pull Request to restrict the range of characters in TagName even further. Rather than “UTF-8 capital letters” which is vague and doesn’t really exist as such, it’s now A-Z only letters.

This capital rule, that turned into a UTF-8 capital rule, was really meant to limit to a simple set anyway, back when emojis didn’t exist (I think).

I also added an ABNF notation to make sure there is no confusion and can be easily parsed:
https://github.com/ietf-wg-cellar/matroska-specification/pull/1065

Best regards,
Steve

> On 11 Jan 2026, at 12:21, Steve Lhomme <slhomme@matroska.org> wrote:
> 
> Hi Roman,
> 
> Thanks for your review. My comments are inline below.
> 
>> On 5 Jan 2026, at 18:33, Roman Danyliw via Datatracker <noreply@ietf.org> wrote:
>> 
>> Roman Danyliw has entered the following ballot position for
>> draft-ietf-cellar-tags-20: Discuss
>> 
>> When responding, please keep the subject line intact and reply to all
>> email addresses included in the To and CC lines. (Feel free to cut this
>> introductory paragraph, however.)
>> 
>> 
>> Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
>> for more information about how to handle DISCUSS and COMMENT positions.
>> 
>> 
>> The document, along with other ballot positions, can be found here:
>> https://datatracker.ietf.org/doc/draft-ietf-cellar-tags/
>> 
>> 
>> 
>> ----------------------------------------------------------------------
>> DISCUSS:
>> ----------------------------------------------------------------------
>> 
>> **  Section 3.2.1
>>   Official TagName values MUST consist of UTF-8 capital letters,
>>   numbers and the underscore character '_'.
>> 
>> What is definition of “UTF-8 capital letters”?
> 
> There is no proper UTF-8 definition for that. See my answer to Éric’s review https://mailarchive.ietf.org/arch/msg/cellar/6la3CAjSVA1w-P0HA8yJx2YPb6Y/
> 
> "That could be problem if someone proposes tag names in some language that don’t have capital letters (chinese ? japanese ?). No sure what to do here. We cannot change the “TagName" type from UTF-8 to ASCII anymore. Maybe we could say only latin capital letters ? A lot of languages have a latinized version.”
> 
> 
>> ** What is the format for an “official TagName”
>> 
>> -- Section 3.2.1 say “Official TagName values MUST consist of UTF-8 capital
>> letters,    numbers and the underscore character '_'.”
>> 
>> -- Section 6.1 says “The Name corresponds to the value stored in the TagName
>> element.  The Name SHOULD always be written in all capital letters and contain
>> no space as defined in Section 3.2,”
>> 
>> The text in Section 6.1 suggests weaker conformance to only “capital letters”
>> is required.
> 
> Indeed, they should have the same requirement. Addressed in https://github.com/ietf-wg-cellar/matroska-specification/pull/1056 
> 
>> ** Section 3.2.2.2
>>   In legacy media containers, it is possible that the "," character
>>   might have been used as a separator or that digit grouping delimiters
>>   might have been used.  A Matroska Reader SHOULD consider the
>>   following character handling to parse such legacy formats:
>> 
>> How does an implementation know it is processing a “legacy media container” to
>> apply these alternative parsing rules?
> 
> If you know the tag is a Number you can parse it as a number and assume a single , corresponds to a decimal separator. But there is no guarantee and no real way to tell.
> 
> The rules have been made stricter since the original Matroska specifications, so that parsing can be made more strict. So new software writing tags from other sources should try to do the conversion to a stricter format. But readers should be aware there may be older formats. They choose what to do with these uncertain values.
> 
>> ----------------------------------------------------------------------
>> COMMENT:
>> ----------------------------------------------------------------------
>> 
>> Thank you to Ines Robles for the GENART review.
>> 
>> I support the DISCUSS position of Mohamed Boucadair, Gorry Fairhurst, and Éric
>> Vyncke.
>> 
>> ** Section 3.1
>>   There is a debate between people who think all tags should be free
>>   and those who think all tags should be strict.  Our recommendations
>>   are in between.
>> 
>> -- “Our recommendations”, is this the opinion of the WG or the authors?
> 
> Addressed in https://github.com/ietf-wg-cellar/matroska-specification/pull/1052
> 
>> -- I found the terms “free” and “strict” confusing.  Is this “free-form” as
>> opposed to being without cost?  “Strict” relative to what?
> 
> Yes, it’s free-form. I proposed a new wording in https://github.com/ietf-wg-cellar/matroska-specification/pull/1057
> 
> “There is a debate between people who think all tags should be free-form and those who think
> all tags should be limited to a set of names."
> 
>> ** Section 3.1
>>   Both have their needs, but it's usually a
>>   bad idea to use custom or exotic tags because you will probably be
>>   the only person to use this information even though everyone else
>>   could benefit from it.
>> 
>> Why using “custom or exotic tags” is a “bad idea” would benefit from more
>> explanation.  The negative implications of low use tags is not clear.
> 
> “you will probably be the only person to use this information even though everyone else could benefit from it"
> Seems to me like a very good reason not to go that way.
> 
>> ** Section 3.1
>>   So hopefully, when someone wants to put
>>   information in one's file, they will find an official one that fits
>>   their need and hopefully use it.
>> 
>> What makes a tag “official”?
> 
> Assigned a name in the IANA registry.
> 
>> ** Section 3.1
>>   It's hard to define what should be in and
>>   what ought not be in a file because it doesn't make sense; thus, each
>>   request needs to be evaluated to determine if it makes sense to be
>>   carried over in a file for storage and/or sharing or if it doesn't
>>   belong there.
>> 
>> It is unclear how a reader should take action on this assessment.
> 
> That’s why it says it’s “hard to define”. And two persons might not even agree what metadata should be in a file and what should not.
> 
> For example I could propose a tag “HAS_SET_FOOT_IN_ROUBAIX”. It could make it in the ~official~assigned list. But no one would really use it. But if some people what to put that information in their file, why not. But that should rather qualify for tags that start with an underscore for private use.
> 
>> ** Section 3.2.1
>>   Official TagName values MUST NOT contain any space.
>> 
>> Is this the same as saying must not contain “space characters”?
> 
> Yes, although we may get rid of that line since we already have a set of characters that can be used.
> 
>> ** Section 3.3.  Table 1 and 2.
>>   | 70              | COLLECTION      | the high hierarchy consisting |
>>   |                 |                 | of many different lower items |
>> What is a “high hierarchy”?  Does this mean the highest element in a hierarchy
>> of elements?
> 
> No as someone could come up with even higher values than 70 for an even more set of “COLLECTION”, and even higher values of a set of these sets, etc.
> 
> If anyone has a better (concise) way to describe a COLLECTION of items, I can make the change.  
> 
>> ** Section 4
>> 
>> 4.  Official Tags
>> 
>>   The following is a complete list of the supported Matroska Tags.  As
>> 
>> What is being implied by “supported”?  Supported by what?
> 
> Addressed in https://github.com/ietf-wg-cellar/matroska-specification/pull/1049
> “The following is the initial list of assigned Matroska Tags."
> 
>> ** Section 4.5.  Table 7.
>> 
>> Consider if further specificity in the format of the FAX and PHONE is needed
>> and would be helpful.  E.164 specifies the 15-digit country code + national
>> destination code + subscriber number format.
>> 
>> [E.164] ITU Telecommunication Standardization Sector, "The International Public
>> Telecommunication Numbering Plan", ITU-T Recommendation E.164, November 2010.
> 
> We never put a formal format for this. So as with demical numbers, forcing a fixed format now means we also have to describe all the other ways that the numbers may have been stored. But I don’t think we can come up with such a list.
> 
> For this kind of thing it’s probably bettter to define a new tag name like PHONE_E164 with a strict format.