Re: [dnssd] Genart last call review of draft-ietf-dnssd-push-20

If a client implements PUSH, it implements DSO which means it implements KEEPALIVE and RETRY DELAY.

That doesn’t mean it will honor every part and it might retry before the delay expires.

But the server sent the retry delay and knows the timeout value and so the server can filter this client for that period of time regardless of whether the client honors it or not. In fact, a server SHOULD do the filtering because the RETRY DELAY is really saying, I’m not going to listen to you until after this timeout.

Also, even if the client closes because of an error, that doesn’t preclude it from using TLS session resumption for the next subscription.

So I’m in favor of always using close_notify and sending a RETRY DELAY for critical errors when needed.

But I think it would be helpful to outline the actual errors that could occur on either end and verify this works in every case. Sending as much information to the other side as possible is helpful for determining bugs. TCP RST signaling doesn’t convey much information.

Tom

> On Jul 11, 2019, at 1:19 PM, Ted Lemon <mellon@fugue.com> wrote:
> 
>> On Jul 9, 2019, at 10:22 PM, Stuart Cheshire <cheshire=40apple.com@dmarc.ietf.org> wrote:
>> This is a fine observation.
>> 
>> You then suggested changing TCP RST to TLS close_notify, not realizing (a) this is only for fatal errors, and (b) the precedent already set by RFC 8490.
>> 
>> We have in fact updated the document, but I think this was too hasty, and we should revert it back to the way it was before.
>> 
>> If not, we at least need to have a thorough DNSSD Working Group discussion about this before making a last-minute change to the protocol.
> 
> To add some further nuance from a discussion that Stuart and I had today on this, there are actually several different cases where connection closes are done, and how they should be done is something we should talk about.
> 
> I think in all cases where the client is closing the connection, there’s a case to be made that we don’t want to use close_notify.   It’s true that an attacker can kill our DNS Push connection in this case by forging an RST to the server.   We should discuss whether this is a serious concern that we need to take into account.   If it is, then using close_notify would protect against this iff the server ignores TCP RSTs for active TLS sessions. 
> 
> But the main argument for using close_notify in this case is that we want to be able to resume.   This will not be the case if the client closed the connection because of a protocol error.   It will be the case when the client is closing the connection due to inactivity.
> 
> There is a case where the server closes the connection when the client sends a duplicate subscribe.   That’s because this is a protocol error: the client is broken, and cannot be expected to take corrective action.   Then the question is, do we close the connection down with a retry-delay to make the client go away, or do we just send an RST?
> 
> Argument in favor of sending retry-delay:
> if the client implements it, it will shut up for a while.
> 
> Arguments against:
> If the client doesn’t implement it, it won’t shut up, so we haven’t gained anything
> Making things “sort of work” when the client is broken isn’t all that helpful—we actually want the behavior in this case to be dysfunctional, so that it is noticed and fixed.
> 
> I think that the working group should consider these issues and come to a consensus.
> 
> My own personal opinion is that we should always do close_notify, because if we can assume this, then an attacker can’t kill the connection by sending an RST, if that behavior is implemented in the TLS/TCP stack.   My one doubt about this is that if we are going through a NAT, will the NAT drop its mapping when it sees the RST?   If so, then close_notify doesn’t protect against this attack for a majority of current users.   It still might be worth doing for IPv6, of course.
> 
> As to whether we should use retry-delay, I have really mixed feelings about this.   I want implementations to be visibly broken when they are broken, but I don’t want to have to operate a server that has to deal with broken clients.   The question is whether forcibly disconnecting will actually cause implementors to take action, or whether it will not be noticed and contribute to dysfunction.
> 
> My personal experience is that breaking badly is actually conducive to improvement, so that’s the direction I’m leaning at the moment.
> 
> _______________________________________________
> dnssd mailing list
> dnssd@ietf.org
> https://www.ietf.org/mailman/listinfo/dnssd