[core] Re: 🔔 Working Group Last Call (WGLC) of draft-ietf-core-coap-pubsub-18

Christian Amsüss <christian@amsuess.com> Mon, 17 March 2025 18:27 UTC

Date: Mon, 17 Mar 2025 19:27:19 +0100
From: Christian Amsüss <christian@amsuess.com>
To: core <core@ietf.org>
Message-ID: <Z9hph_OFXXd-bExn@hephaistos.amsuess.com>
References: <9126E846-4025-454E-AC80-9C60A682E60B@tzi.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="jzOuD5/HgGSD+F29"
Content-Disposition: inline
In-Reply-To: <9126E846-4025-454E-AC80-9C60A682E60B@tzi.org>
Message-ID-Hash: KPLWDDZLSFXPTKA4YJFZ7BGJOOVAQK4O
Precedence: list
Subject: [core] Re: 🔔 Working Group Last Call (WGLC) of draft-ietf-core-coap-pubsub-18
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/mhflFWieVsoLNe8Zjcfspg73M08>

Hello PubSub authors, CoRE,

On Fri, Feb 28, 2025 at 11:10:12PM +0100, Carsten Bormann wrote:
> This email starts a Working Group Last Call for the document:

I've read the published draft; thanks for driving this to a WGLC. My
impression is that this document is generally in a mature state, but has
issues with restatement of existing protocols, some missing context, and
some minor oddities in the protocol.

Full list of comments, roughly in sequential order:

* Much as it saddens me to demote core-interfaces, the "topic
  collection" uses that reference in a normative way, and that'd block
  publication. I suggest to reword it while keeping the reference, eg.

  | A topic collection is hosted as one collection resource at the
  | broker. A collection resource is a resource that contains links to
  | other resources that a client can add or remove; that concept is
  | described more generally in {{Section 3.1 of
  | {{?I-D.ietf-core-interfaces}}.

* topic: The definition is recursive.

* The definition of broker makes it sound like any number of topic
  collections on a server would necessarily be managed by the same
  entity. Rephrasing like "A CoAP server component" would help to fend
  off that interpretation.

  Reading on in 2.3.1 and 2.3.2, the broker is an extra entity
  discoverable. I'd have expected the broker to be an abstract,
  intangible entity -- roughly corresponding to the software that backs
  the topic collections; exposing it separately appears needless to me.
  After all, the choice of whether different collections are managed by
  different software or a single multi-tenant system would be an
  implementation choice.

  I found no operations this document defines on the broker. So: What
  does it do? If it doesn't do anything, why express it as a resource?
  ("Because it is there" alone is no reason to expose it as a resource.)

* Figure 1: The mention of the dedicated administrator indicates that
  there can be CoAP clients that are neither publishers nor subscribers.
  Would it make sense to add them in the figure, or at least to point
  out in the figure's text that "Clients that merely interact with topic
  configuration but not its data are not depicted"?

* By the way, I like the hexagon resources.

* "occur at a link, where the link target": I think you mean "Occur at
  the target of a link, which is".
 
* "It encodes the URI as a CBOR text string.": Is it necessarily encoded
  as a URI? I think it's encoded as a URI reference.

* How fast would core-href need to complete to become a viable
  replacement for the URI references in text strings herer?

* topic-data: Some high scaling designs we discussed during IETF105
  involve the topic-data being a resource on another address. While
  there is technically no need to make this explicit here (it already
  says "URI" without any qualification as to its origin), implementers
  of subscribers might easily miss this possibility and create
  implementations that break with publishers that avail themselves of
  of this feature.

* resource-type: The way resource types are usually encoded is a list
  of multiple RTs (eg. space delimited in linkf format). Is that also
  allowed here? (If so, it'd probably be list typed, and the value
  should at least contain "core.ps.data").

  Also, if this is required, why is it sent around mandatorily in some
  places (eg. in 2.4.3 resource creation), but not in others (eg. 2.4.1
  topic discovery)?

* content-format: CoAP resources are sometimes available in multiple
  formats; AIU it is not planned to have many of them sent in parallel
  (that makes sense). Is it planned that a client might ask the broker
  to convert content formats? If so, that property might say that it
  "specifies the +canonical+ CoAP Content-Format identifier".

* expiration-date: Why a string? CBOR's tag 1 (or untagged integer, or
  1001 but likely not) would be way more accessible to constrained
  devices.

* What's the point of max-subscribers?

  I'd understand if there were some prioritites on clients and/or topics
  so that a broker would allocated resources to high-priority topics,
  but max-subscribers sounds rather unsuitable for that, a bit as if
  you'd use file system quotas but set the quota per file.

  Frankly, this sounds like something that was come up with in a "Hm, if
  we had a configuration resource, what might be properties the topic
  has that we could describe?" as an example, and then carried on
  because it was not questioned.

* Observer-check almost also falls into the same category. I do see how
  this would make sense as a per-host tunable, but per-resource?

* initialize: I remember very old discussions on this. I see how sending
  no Content-Format might seem like a way to weasel out of 7641's
  requirement that "The Content-Format specified in a 2.xx notification
  MUST be the same as the one used in the initial response", but I think
  it still violates its letter: "indeterminate" can't be "equal" to any
  format without violating the inherent transitivity of equality
  relations.

  Given that a resource has a static discovered content-format, a client
  may be justified in inferring the format of a received representation
  to be the advertised one, and interpret the empty byte string in that
  context.

  I'd prefer a solution where the initial representation has the right
  content-format already (which is an option I don't think should be
  sent around all the time), and that initialization is only practically
  usable if that format has an "empty" representation, which may or may
  not be the empty byte sequence; that representation would be passed
  around in "initialize". For example, for CBOR SenML, that's an empty
  array (1 byte).

* Similar to initialize, earlier versions had a "tombstone"
  representation. That's a frequently praised feature of MQTT (as
  "testament and last will"), and I'd expect that potential users would
  have questions on it -- what happened? (It could use the mechanism
  suggested above for initialize).

  Granted, this can also be done in an extension, as this would need
  further elaboration on what it means for a publisher to be "gone" --
  can a publisher shape the broker's expectations on how often it sends,
  and make the data expire? (This is subtly different from
  expiration-date: Once that is exceeded, the topic data goes; exceeded
  value data lifetime would just make it go to half-created, and would
  rather use relative time stamps).

* 2.3.3 vs. 2.4.1: I'm missing guidance on when to use which, in either
  role (as a server or client). The justification of authentication does
  not seem to be a criterion: RFC6690 Section 6 explicitly mentions that
  .well-known/core can apply per-entry access control ("and allow
  servers to return different lists of links").

  That 2.3.4 is required to be supported doesn't make this any clearer
  (like, what would *that* do with topics whose metadata are
  confidential?).

* 2.4.2: The way a filter is applied as "give me any item where the
  item's content matches the filter" (as opposed to "filter the content
  of the list") is practical here, because a TBD606 resource only "fits"
  on the values, and a yet-unspecified "filter on a list of links"
  format only "fits" the list.

  When CoRAL makes progress, we'll be in the hard situation that a CoRAL
  FETCH request payload may both apply to the list and to the content.
  That's likely managable, but life would be easier if this section
  specifically only described FETCH semantics of a collection resource
  for TBD606 request payloads.

* 2.4.3: Why is topic-name mandatory? This could be a SHOULD, eg. for
  devices that start publishing data on startup, and then a tool comes
  along and sorts out what is what. (Those publishers might, for
  example, place custom binary identifiers in a property they put in at
  creation time).

* "MUST respond with a 4.03 (Forbidden) error": I recommend relaxing
  this to a "MUST respond with a 4.xx class error". This makes it clear
  that it is the client's fault, but opens for other errors, such as
  4.01 "I could but you'll need to show me authorization" (including
  4.01 with Echo "Tell me that you mean it"), 4.29 "Not so fast" and
  4.13 "Keep it short".

  Same goes for 2.5.1, and several later places.

  (And I should start providing alias names to codes in CoAP libraries.)

* Similarly, 2.5.1 requires a 2.05 Content response, when really a
  client may apply the very regular CoAP mechanism of using an ETag to
  verify freshness (and then the response is 2.03 Valid).

  Rather than enumerating possibilities, I suggest to use "returns a
  successful response (typically 2.05 Content)". The solution is *not*
  to enumerate other options: Things such as observing a resource are
  orthogonal to application specifications such as this, and should not
  need extra spec work to fill gaps. It does make sense to point out
  some examples (as is done well for 4.29 in 3.2.1).

  Same goes for later mentions of successful codes. (For example, 2.5.3
  mentions 2.04 -- but both 2.05 and 2.04 are, in their cases, the one
  successful code that the request method has unless something
  particular happens, like 2.03 when an ETag is used, or 2.31 during a
  block-wise operation.

  Some particular examples: 

  * A DELETE can not only result in a 2.02, but, due to idempotency and
    the thus relaxed rules on deduplication, also a 4.04, which is also
    fine). Likewise, even the first PUT on a topic can not be detected
    reliably, and could also return a 2.04 Changed. (The language of
    7252 is really sufficient here, and this document makes this
    needlessly stricter).

  * "MUST return 2.05" in 3.2.2: In all this strictness, this misses
    that a 4.29 is a perfectly good response even when the topic is
    present and the operation could succeed.

  * 3.2.3 spells out how Observe works (this is a normative dependency;
    describing it is fine, but this is normative restatement), and
    thereby pulls in mechanisms that don't even apply to all updates
    that have been made to 7641 (when using 8323, there is no RST).

* conf-filter: Why is this even in content format TBD606? This feels
  like it is forced into a format that it has nothing in common with.

  And: Why are the values strings rather than numbers? (Might just be
  an oversight in the example, 6.4 states that the names are not used in
  encoding).

* "MUST have Content-Format set to" (and possibly similar language in
  other places): This precludes extension to other formats without
  updating this document. Content formats are a natural extension point,
  please consider phrasing this such that doing the operations with
  those content formats is a default and mandatory-to-implement.

* 'Note that updating the "topic-data" path' ... was just forbidden two
  paragraphs prior.

  (Same in 2.5.4)

* topic-data property going away when topic data is DELETEd: If a
  half-created topic has no topic-data property, how can a publisher
  know where to PUT data to put it back into fully created?

  It may not even be the same publisher that returns, and even if it
  was, it'd be odd to require it to remember the path after the server
  removed the property.

  I think that the right thing to do here would be to keep the
  topic-data present (and constant) over the whole life and halflife of
  he topic resource.

* 6.4: This allows text strings next to numbers.

  I've seen this pattern in many places, have not seen any entries in
  the respective registries that actually were text, and consequentially
  never implemented tolerance for strings there in constrained systems.

  On the "use it or lose it" dichotomy, I suggest to just lose it
  already, and make this numeric only.

* Missing: Context within the larger CoRE ecosystem.

  The document describes a PubSub broker as a new application, without
  guiding the reader to understand how well it integrates in the
  existing CoAP ecosystem. This comprises two aspects, both written in
  the hope that they can already be a starting point for new
  introduction text:

| # How does this interoperate with CoAP clients unaware of PubSub?
|
| Neither the publisher nor the subscriber necessarily need to be aware
| of PubSub happening. Any CoAP client that can be configured to PUT
| data to a particular URI can be set up to be a publisher, and any CoAP
| client that can be configured to GET or observe a particualr URI can
| be set up to be a subscriber. A tool that performs the setup can
| create or discover a topic, and configure the clients to perform the
| relevant operations.
|
| # How does this compare to REST access to the publisher?
|
| Compared to the REST baseline case of an information recipient GETting
| (observing) a resource at its producer (or from an intermediary/proxy,
| which itself GETs it from the original source), using a pubsub broker
| alters two separate aspects:
|
| * It moves the name authority (the URI) of the resource under the
|   control of the pubsub server.
| * It inverts the initiative on the connection to the producer from
|   "pull" to "push". (While CoAP observation is a sequences of data
|   pushes, the original trigger is still with the recipient).
|
| Conceptually, those are independent, and the architecture supports
| removing those aspects without any change to the subscriber side.
| Specifying that is out of scope for this document, but can be done
| using the extension point described in Section 6.4. Without the name
| authority (URI) change, a pub-sub broker is an eagerly populated
| caching forward proxy (which may even refuse to or fail to reach out
| on a cache miss). Without the initiative change, a pubsub broker acts
| as a reverse proxy, whose per-path forwardring rules are configured
| through its management interface.

  (When neither the address is changed nor the initiative changes, the
  broker degenerates into a directory not entirely unlike an RD, but with
  individually published resources -- but pointing that out in the text
  would be utterly confusing to readers.)

Best regards
Christian

-- 
You don't become great by trying to be great. You become great by
wanting to do something, and then doing it so hard that you become great
in the process.
  -- Marie Curie (as quoted by Randall Munroe)

Attachment: signature.asc

[core] 🔔 Working Group Last Call (WGLC) of draft-… Carsten Bormann
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Marco Tiloca
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Jaime Jiménez
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Marco Tiloca
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Jaime Jiménez
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Jaime Jiménez
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Marco Tiloca
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Christian Amsüss
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Marco Tiloca
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Christian Amsüss
[core] Re: 🔔 Working Group Last Call (WGLC) of dr… Christian Amsüss
[core] Re: Expressing eventual consistency in Pub… Carsten Bormann
[core] Re: Expressing eventual consistency in Pub… Christian Amsüss
[core] Expressing eventual consistency in PubSub … Christian Amsüss

[core] Re: 🔔 Working Group Last Call (WGLC) of draft-ietf-core-coap-pubsub-18

Attachment: signature.asc