Re: [DNSOP] SVCB ALPN value presentation format

Larry Campbell <lcampbel@akamai.com> Wed, 01 July 2020 01:51 UTC

Return-Path: <lcampbel@akamai.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3A8013A08D7 for <dnsop@ietfa.amsl.com>; Tue, 30 Jun 2020 18:51:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bbDSg5LSPabR for <dnsop@ietfa.amsl.com>; Tue, 30 Jun 2020 18:51:26 -0700 (PDT)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DE8AE3A08D6 for <dnsop@ietf.org>; Tue, 30 Jun 2020 18:51:26 -0700 (PDT)
Received: from pps.filterd (m0050093.ppops.net [127.0.0.1]) by m0050093.ppops.net-00190b01. (8.16.0.42/8.16.0.42) with SMTP id 0611jhbe032090; Wed, 1 Jul 2020 02:51:23 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=jan2016.eng; bh=zDIIiwDw4IjXSYgvEnn9qnIbOXodUKQV3HPqVh1KnkI=; b=VRgbnyZUve3aWOUI9qbvh5kn1PVTTDOfIsCKV9m4CNsiOIxcaGrBpQM66uRexLApG3Af 1lVZGyH4Ce4BJoTpdl0gr98gaFLFuAeU1PUiALjN4TghllMlfH2nxWwrKWXR4v7I8rBd +7Yie83EUzZ8TnQn1FOoAZN6IqGMWynr5sOhknLpE6J0bIyaOP13nQGKvFAKmI9oUjXg NY0nApLqNbBNML2DGbOzGNTEQ/awG39xc7fJgRXTk+yNq9WOFH8TZbeZmRcMeVHWh2xb MyaY4q13ZXM8bKJMILmaVLmBnmp/tOeuZLauYerZKoAUoIlShJ5ffnFyfsPhAYj89s2h 2g==
Received: from prod-mail-ppoint6 (prod-mail-ppoint6.akamai.com [184.51.33.61] (may be forged)) by m0050093.ppops.net-00190b01. with ESMTP id 31wwd2nfft-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 01 Jul 2020 02:51:22 +0100
Received: from pps.filterd (prod-mail-ppoint6.akamai.com [127.0.0.1]) by prod-mail-ppoint6.akamai.com (8.16.0.42/8.16.0.42) with SMTP id 0611o0tq006805; Tue, 30 Jun 2020 21:51:21 -0400
Received: from prod-mail-relay19.dfw02.corp.akamai.com ([172.27.165.173]) by prod-mail-ppoint6.akamai.com with ESMTP id 31x1exvpxn-1; Tue, 30 Jun 2020 21:51:21 -0400
Received: from [127.0.0.1] (prod-ssh-gw01.bos01.corp.akamai.com [172.27.119.138]) by prod-mail-relay19.dfw02.corp.akamai.com (Postfix) with ESMTP id 9488360812; Wed, 1 Jul 2020 01:51:20 +0000 (GMT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Larry Campbell <lcampbel@akamai.com>
In-Reply-To: <CB413C11-A43A-4B28-A368-EF0A5D9E73DB@isc.org>
Date: Tue, 30 Jun 2020 21:51:20 -0400
Cc: Ben Schwartz <bemasc@google.com>, dnsop <dnsop@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <FB199FF3-F673-473A-B073-439FFDBC2DC7@akamai.com>
References: <80758944-FC52-4349-9C8B-EF4083C62F1B@akamai.com> <CAHbrMsDcvJnij9UmxJ_JBMCCxYUzq=YcG+YEd5JHVyB_H3g6Rw@mail.gmail.com> <7172A73F-DF24-4375-A4DF-F13426A7C96C@akamai.com> <CDC48C93-8634-489F-A0A7-A4D79CE1170E@isc.org> <AA031103-38AA-4E57-B8BF-CE3385FB6FEF@akamai.com> <CB413C11-A43A-4B28-A368-EF0A5D9E73DB@isc.org>
To: Mark Andrews <marka@isc.org>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-06-30_06:2020-06-30, 2020-06-30 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007010008
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-06-30_06:2020-06-30, 2020-06-30 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 suspectscore=0 phishscore=0 malwarescore=0 adultscore=0 impostorscore=0 mlxscore=0 spamscore=0 lowpriorityscore=0 bulkscore=0 clxscore=1015 mlxlogscore=999 cotscore=-2147483648 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007010008
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/YIRif9jaIILdtlvRhokaeDHt8bw>
Subject: Re: [DNSOP] SVCB ALPN value presentation format
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Jul 2020 01:51:29 -0000

Parsing as a character-string does not necessarily imply unescaping any sequence other than \". So you parse as a character-string, then parse again as a comma-separated string, then unescape the components.

Input: "a\,\000,b\,\000\""

Parsed as character-string: a\,\000,b\,\000"

Split on unescaped comma:
    a\,\000
    b\,\000"

Unescaped:
    %x61 %x2c %x00
    %x62 %x2c %x00 %x22

But I am still (or again) confused by this text:

   ALPNs are identified by their registered "Identification Sequence"
   (alpn-id), which is a sequence of 1-255 octets.

   alpn-id = 1*255(OCTET)

   The presentation value of "alpn" is a comma-separated list of one or
   more "alpn-id"s.  Any commas present in the protocol-id are escaped
   by a backslash:

   escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
   escaped-id = 1*(escaped-octet)
   alpn-value = escaped-id *("," escaped-id)


(1) The text mentions "protocol-id" which is a phrase not found anywhere else in the text. I think it should probably have said "alpn-id".

(2) The productions above imply that %x00 (null) is a valid character *in the presentation format*. I think that has to be a mistake. Do we really want literal nulls in the presentation format?

(3) Or perhaps the escaped-octet rule means that these are terminals found *after* unescaping sequences like \000?

- lc


> On Jun 26, 2020, at 17:11, Mark Andrews <marka@isc.org> wrote:
> 
> Except you can’t actually do that.  ‘\044' becomes ‘,' on the first pass if you parse it as a character string first. The ONLY way this works is if you remember which commas are escaped or not (\044 or \, vs ,).  It’s dead easy to split it into alpn-id as you unescape the string.
> 
> Mark
> 
>> On 18 Jun 2020, at 23:53, lcampbel@akamai.com wrote:
>> 
>> OK, I think I now understand the intent, and refactored my code accordingly, and it is now simpler and cleaner. Yay.
>> 
>> I think it would be clearer to implementers if section 2.1.1 said that all values are initially parsed as character-strings (allowed to exceed 255 characters), and then further parsed by SvcParamKey-specific parsing which may, for example, split on comma. I think the current text isn't entirely clear on the functional separation between generic parsing and key-specific parsing.
>> 
>> - lc
>> 
>> 
>>> On Jun 15, 2020, at 22:04, Mark Andrews <marka@isc.org> wrote:
>>> 
>>> 
>>> 
>>>> On 14 Jun 2020, at 05:01, Larry Campbell <lcampbel=40akamai.com@dmarc.ietf.org> wrote:
>>>> 
>>>> I think there's an implementation difficulty. Consider:
>>>> 
>>>> 1.  alpn=h2		; clear enough
>>>> 2.  alpn="h2"		; should be equivalent
>>>> 3.  alpn=\h\2		; should also be equivalent
>>>> 4.  alpn=h2,h3		; ok (two values)
>>>> 5.  alpn="h2","h3"	; should be equivalent
>>> 
>>> No, as it is key=quoted-string as per 2.1.1 not key=quoted-string(,quoted-string\)*
>>> 
>>>> 6.  alpn="h2,h3"	; malformed? or a single alpn value of h2,h3? or two three-character values, "h2 and h3”?
>>> 
>>> this is correct
>>> 
>>>> 7.  alpn=h2\,h3,h4	; how should this be parsed?
>>> 
>>> 0x05 0x68 0x32 0xc2 0x68 0x33 0x02 0x68 0x34
>>> 
>>>> Section 2.1.1 tempts one to build the obvious implementation of using one's existing character-string parser, and then passing the parsed character-string to the individual handler for each key type. The alpn and ipv*hint handlers are going to want to split that character-string on comma. That would treat #6 as two two-character values (h2,h3). But #7 is problematic: the generic character-string parser would remove the backslash, and then the alpn handler would treat this as three alpn values when you probably wanted just two
>>> 
>>> When you are also parsing domain names you have to deal with \. being a literal period not a domain separator.
>>> exa\.mple.com and “exa\.mple.com” aree being two labels ‘exa.mple’ and ‘com’.  This is not really different.
>>> 
>>> That said we do need to address this issue.
>>> 
>>> In BIND we extract quoted-string preserving the escapes (except for ‘\”’) then pass the token to a domain name parser or a text parser. Having ‘key=‘ preceding the quoted-string is more of a issue and we have to shift modes mid-token.
>>> 
>>>> We could make a special character-string parser for alpn and ipv*hint, that handles commas, but it feels odd to have to use a special parser just for certain key types. However, if we must allow commas in alpn names, then we have no choice.
>>> 
>>> You need to reparse value for port, alpn, ipv*hint,
>>> 
>>>> Perhaps it would be clearer to simply remove the three paragraphs of section 2.1.1 beginning with "The presentation for for SvcFieldValue is..." and ending with "...not limited to 255 characters.)". Since the previous paragraph says "Values are in a format specific to the SvcParamKey", perhaps it would be best to leave the description of each value format in the appropriate part of section 6 and for section 2.1.1 to discuss only how to represent and parse unrecognized keys.
>>> 
>>> 
>>>> 
>>>> To keep the implementation simple, the alpn value could be defined as a comma-separated list of sequences of printing ASCII characters, with embedded comma represented as \, backslash as \\, and all nonprinting and non-ASCII characters reprsented as \nnn. (In other words, the full generality of character-string, particularly double-quotes, is not needed here.
>>>> 
>>>> The other comma-separated value types -- ipv4hint and ipv6hint -- do not have this difficulty; they also don't need the full generality of character-string handling, because the individual values can contain only hex digits, periods, and colons, so their specification and implementation can be much simpler.
>>>> 
>>>> And I think section 2.1.1 would be clearer if
>>>> 
>>>>  using decimal escape codes (e.g. \255) when necessary
>>>> 
>>>> were replaced by
>>>> 
>>>>  using decimal escape codes (e.g. \255) for all nonprinting and non-ASCII characters, and using \\ to represent backslash
>>>> 
>>>> - lc
>>>> 
>>>> 
>>>>> On Jun 13, 2020, at 11:25, Ben Schwartz <bemasc=40google.com@dmarc.ietf.org> wrote:
>>>>> 
>>>>> Larry,
>>>>> 
>>>>> I think that's the intent of the current text, especially the ABNF for "element".  If you think it's unclear, we should adjust it.  Please suggest text!
>>>>> 
>>>>> --Ben Schwartz
>>>>> 
>>>>> On Sat, Jun 13, 2020, 10:53 AM Larry Campbell <lcampbel=40akamai.com@dmarc..ietf.org> wrote:
>>>>> Seciont 6.1 says:
>>>>> 
>>>>>> The presentation value of "alpn" is a comma-separated list of one or more "alpn-id"s. Any commas present in the protocol-id are escaped by a backslash:
>>>>>> 
>>>>>>  escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
>>>>>>  escaped-id = 1*(escaped-octet)
>>>>>>  alpn-value = escaped-id *("," escaped-id)
>>>>> 
>>>>> If I read this correctly, the presentation value is allowed to contain nulls and control characters. This seems likely to make such records very difficult to edit. Wouldn't it be better to require these to be encoded as \nnn?
>>>>> 
>>>>> - lc
>>>>> 
>>>>> _______________________________________________
>>>>> DNSOP mailing list
>>>>> DNSOP@ietf.org
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e= 
>>>> 
>>>> _______________________________________________
>>>> DNSOP mailing list
>>>> DNSOP@ietf.org
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e= 
>>> 
>>> -- 
>>> Mark Andrews, ISC
>>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>>> PHONE: +61 2 9871 4742              INTERNET: marka@isc.org
> 
> -- 
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742              INTERNET: marka@isc.org
>