[DNSOP] WGLC session-state-signal comments

Edward Lewis <edward.lewis@icann.org> Fri, 02 February 2018 17:26 UTC

Return-Path: <edward.lewis@icann.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47CD412D890 for <dnsop@ietfa.amsl.com>; Fri, 2 Feb 2018 09:26:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.211
X-Spam-Level:
X-Spam-Status: No, score=-4.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id thJO1Gu43fSI for <dnsop@ietfa.amsl.com>; Fri, 2 Feb 2018 09:26:05 -0800 (PST)
Received: from out.west.pexch112.icann.org (pfe112-ca-1.pexch112.icann.org [64.78.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9F29B120454 for <dnsop@ietf.org>; Fri, 2 Feb 2018 09:26:05 -0800 (PST)
Received: from PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) by PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Fri, 2 Feb 2018 09:26:03 -0800
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([64.78.40.21]) by PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([64.78.40.21]) with mapi id 15.00.1178.000; Fri, 2 Feb 2018 09:26:03 -0800
From: Edward Lewis <edward.lewis@icann.org>
To: "dnsop@ietf.org" <dnsop@ietf.org>
Thread-Topic: WGLC session-state-signal comments
Thread-Index: AQHTnErm4n25Jo9Vh0Kuly2JHC8cjw==
Date: Fri, 02 Feb 2018 17:26:03 +0000
Message-ID: <236B155A-12DF-4CA1-AB61-8E7407A7E0F2@icann.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.9.0.180116
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.234]
Content-Type: text/plain; charset="utf-8"
Content-ID: <1C2AD5D3D83A424B83E425F99602C950@pexch112.icann.org>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/Iig2nnIvBFsjG5FafKXgpdtBDTw>
Subject: [DNSOP] WGLC session-state-signal comments
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Feb 2018 17:26:08 -0000

(Due to weirdness with email, the WGLC announcement for me came on DNSSD and not DNSOP.  Shoulder shrug - something to debug for later.)

I was told of this last call:

referring to the document at https://tools.ietf.org/html/draft-ietf-dnsop-session-signal-05

Overall - the approach looks promising.  I would want to see this run through workshops to see how well it is compatible with the installed base before too much is built on top of it.  The idea of state-per-channel has long been a taboo subject, one that I think should be broken, nevertheless, we have to understand this change in thinking.  "Corner cases" are always a concern.

Section 2, terminology

# The unqualified term "session" in the context of this document means
#    the exchange of DNS messages over a connection where:
# 
#    o  The connection between client and server is persistent and
#       relatively long-lived (i.e., minutes or hours, rather than
#       seconds).
# 
#    o  Either end of the connection may initiate messages to the other.

I fear that this may cause confusion down the line, given the history of 
how terms have been used in previous RFC documents.

A session to those who recall the 7-layer OSI model conjures up ideas of
end-to-end associations that make use of one or more transport"s", or
connections.  I.e., a session might use multiple, parallel in time, transport
channels or may use successive in time transport channels.

In this use, the DSO session refers to the management of the transport
connection between two intermediate elements of the DNS.

# A "DSO Session" is established between two endpoints

The endpoints - is that the stub resolver and authoritative server (on in
DNSSEC terms) the signer and the validator?  I believe here it is the
client-server actors for the channel.

# A "DSO Session" is terminated when the underlying connection is
# closed.

This is in conflict with the older notion of a session persisting across
transports.

To support that notion, using (for convenience):
 https://en.wikipedia.org/wiki/Session_layer, this is stated

"If a connection is not used for a long period, the session-layer
protocol may close it and re-open it."

Note - this is terminology, not conceptual.  However, I've seen terminology
become a greater problem "down the road" as new people come into the field.

As a suggestion I'd work in that this is "DNS channel management".  It
sounds like the document is defining a DNS channel management protocol.
Or perhaps DNS transport management protocol.

Section 4.1

This section intermingles text on whether or not each DSO request elicits
a response or not and the process of DSO "session" establishment.  With proper
editing, these should be separated to lessen confusion.

Section 4.1.1

Requirements (first MUST) and recommendations to operate in a certain way
tend to become dated quickly.  Instead of placing requirements on clients
to act good, let the server refuse workload.  I am thinking of the issue
surrounding the iterations in NSEC3 and recent surveys of operators. Despite
the documents saying a low value is better, operators use high values.

With that in mind, I'd not have "clients MUST take care".  If not for the
reasoning above, then "how does one test whether a client has taken care"?

Instead, reinforce the notion that a server has the right to deny opening a
connection on its own grounds (local policy), including load considerations.

Section 4.1.3

This section reinforces the notion that this is "transport channel management"
and not "session management in the OSI layer 5 sense."

Section 4.2.1

This phrase is confusing and unnecessary "this is a fatal error".  The logic
to that point is clear that the situation doesn't happen and the 
prescribed behavior ("close the connection") makes sense.

This is clumsy, when describing the RCODE: "generally set to zero on
transmission, and silently ignored on reception, except".  I'd suggest 
saying ... well reading it again, I don't understand what the paragragh
is saying.  It starts out with RCODE being 0 on send, then except when
it conveys the reason for termination...which I'd expect to be in a response.
However, it might be better to say that the RCODE value may be set according
to the definition of the request, but in most cases, will be 0.  Maybe?

Section 4.2.2

"  Unacknowledged
   request messages are only appropriate in cases where the sender
   already knows that the receiver supports and wishes to receive these
   messages."

This passage causes concern in me.  The entire notion of unacknowledged
requests trouble me as a protocol designer.  There are three outcomes of
sending a request over a reliable transport - the receiver doesn't understand
it, the receiver acts accordingly, or the receiver misinterprets the request.
The latter includes software bugs (or other unintended consequences) and
could cover outright refusal to perform.  It's not that an acknowledgement
is needed, the question is how the sender can confirm that the request was
handled to the sender's expectations.  (We have this problem, for example,
with "Automated Updates of DNSSEC Trust Anchors" where there is no feedback
loop.)

Ok, I get this: "For example, after a client has subscribed for Push
Notifications" as a plausible use case.  In this case I see that it is
not that the requests need message responses as turning them on and off
is done via an acknowleded request action.  Maybe it's the term that is
confusing but I can see the concept.

Perhaps these are not "unacknowledged requests" but "subsequent responses".

Section 4.2.2.1

"  Where domain names appear within TYPE-DEPENDENT DATA, they MAY
   be compressed using standard DNS name compression [RFC1035]."

Do not do name compression!  No No No No.  The compression was originally
defined to be for the well-known types (in the STD 13 documents), then became
to be used for newer and newer ones up through DNSSEC - until someone
realized that this is a mistake.  Consult "Handling of Unknown DNS RR Types",
specifically this:

   To avoid such corruption, servers MUST NOT compress domain names
   embedded in the RDATA of types that are class-specific or not well-
   known.  This requirement was stated in [RFC1123] without defining the
   term "well-known"; it is hereby specified that only the RR types
   defined in [RFC1035] are to be considered "well-known".

Section 4.2.2.4

   "If DSO request is received containing an unrecognized Primary TLV,
   with a zero MESSAGE ID (indicating that no response is expected), the
   receiver MUST silently ignore the message.  A response MUST NOT be
   sent."

I would have thought this would warrant tearing down the connection given
the words earlier that this ought never happen.  (I'd want the receiver
to alert the sender that there's perhaps a capability mismatch though.)

Silently ignoring a problem is an example of a receiver acting in a manner
that is not expected by the sender and leads to wedged state machines.

Section 4.3

   "The namespaces of 16-bit MESSAGE IDs are disjoint in each direction.
   For example, it is *not* an error for both client and server to send
   a request message with the same ID."

This will, someday, confuse a young and inexperienced DNS hosting engineer.

Precedent - having DNSKEYs with the same key_id is allowed by protocol but
popular DNSSEC key management tools will discard any key matching another's
key_id, to preserve the sanity of the human who will debug.

Section 5.3

   'Just because a DSO Session has no traffic for an
   extended period of time does not automatically make that DSO Session
   "inactive", if it has an active operation that is awaiting events.'

There will be a need to fight cruft, or garbage collect.  Inactive objects
tend to be forgotten while still using up resources.  There ought to be some
means to manage what might be otherwise "forgotten."  Like subscribing to
events that are no longer coming.

Section 5.5

"and attempt re-connection if appropriate."

I thought that if a connection ends, the DSO session ends.

Section 5.6.3.2

Sometimes a client can't distinguish this:

"If reconnecting to the same server," as some server processes have multiple
addresses and names.

The following MUST ought to apply to "the same IP address" at best.  (But
if the underlying routing of anycast traffic changes, that's overkill.) Still
make this rule IP address (and maybe port number) specific, not "server".

Section 6.2

"The RECOMMENDED value is 10 seconds." Probably a bad idea to codify this
because implementations will set it to 10 and not scatter it when it should
be.  (Like closing out many connections in a load-shedding panic.)