Re: [dnssd] Short TTLs in Discovery Responses leads frequent cache refreshes for long-lived queries...

Ted Lemon <mellon@fugue.com> Tue, 18 October 2022 13:37 UTC

Return-Path: <mellon@fugue.com>
X-Original-To: dnssd@ietfa.amsl.com
Delivered-To: dnssd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 16FC4C1524A4 for <dnssd@ietfa.amsl.com>; Tue, 18 Oct 2022 06:37:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.906
X-Spam-Level:
X-Spam-Status: No, score=-6.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=fugue-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id apiNpdZxi8Ma for <dnssd@ietfa.amsl.com>; Tue, 18 Oct 2022 06:37:03 -0700 (PDT)
Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2DB7FC1522DC for <dnssd@ietf.org>; Tue, 18 Oct 2022 06:37:02 -0700 (PDT)
Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-333a4a5d495so137140527b3.10 for <dnssd@ietf.org>; Tue, 18 Oct 2022 06:37:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fugue-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fNGyIuOF+l7ok37a9QISmITxTLlDy30gmqHq6O6Wc38=; b=FFjfinV1m87w8qugUmMMyOHTvh5/3vi37b0MRcijJkdp6u7BCY1RIa3uQEq5Q3C0R/ 79ZjJX+yXCesl49EAVfgN2Ol5hT/nZ3quxM10ZT98Qogd0q/jAn+t3AwHlDqLzw4p9AK 8JpIOW6QB5H57Wsz0zqI46xLFgOhY8oauR0XfVm4zu2ARRtGp0jUuJ5AijJwLdGP/1V9 5be9MP5ty+SKOZoE5UNdhda+RTuY6ZKgQe78s/RQVMvqy6excwpGhmL7oIDvF8OFu5fe moRrxqgJ5Kn7gKyApPhMbvGmZyNz98//Jo2LgRxmQn7BLyXd+gna9/yq6srlcI3uV/NF ZpRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fNGyIuOF+l7ok37a9QISmITxTLlDy30gmqHq6O6Wc38=; b=4hgF1TFGi3sOyRyuEacMqYsgEReByiTB6Q905SSdeAZmh1jvPjODl2altOkrFHkvoL NBfFdOl7vkkk0e8fa0xeui53wFyXnlY8hBriA1FtNbJIFDMkwAv7kTPEoKug2QYR0gcl JV2dqmTJzD/XIlAiUcWrsSJ4gkr13D8jtLBYE8QSu2ZaT1eM3sLHsTmWcy4d3E6QtqWQ 6WuDEMk2XtA0sstOTfgRClXR+Ib/QQ4pq1Ii+NXIPuI8DUTJxJuTzq5iFXmT+bCioqF0 XUP2wVnfyzVQMEbeNjS89mpRfsyh3r9rAacz+/WyQXOBjR/R3yf7dsXn5WLw2qMHi1Qx JALg==
X-Gm-Message-State: ACrzQf2ssuTRTteOrPfoGMEUsMNrRzNGms8J7DXXEoYDjPz+NMObxp24 7AFT4CPK8LeXylHPbedQOXpmIOoaK4ZE4Up1NbLcrP+Igauku8DM
X-Google-Smtp-Source: AMsMyM7nalCZ4c+fXSAO/1JhSE/l8vAKgiBEHALsSnr5oC/zNNB67WfjbUQLft6W/1iAgqW+3Bvsuj+qlxQfFuxFj8w=
X-Received: by 2002:a81:a087:0:b0:357:3a80:1fca with SMTP id x129-20020a81a087000000b003573a801fcamr2423300ywg.481.1666100221718; Tue, 18 Oct 2022 06:37:01 -0700 (PDT)
MIME-Version: 1.0
References: <CAPt1N1=2GeXv-3S+_Eo8HQoyjUD=Th9qs9aSBrPWS2OJMDmCLw@mail.gmail.com> <CAGwZUDuCtWMs06SfmdBFw9XH=uj42ywgV0aenQYFWUiWtr0f9A@mail.gmail.com> <DU0P190MB1978FC08F34DBA93BECF80FAFD299@DU0P190MB1978.EURP190.PROD.OUTLOOK.COM> <CAPt1N1ki-ZQ1Qc9cC_bR1a7sQUB-oF9fYajGPUODEq6xQE-e2A@mail.gmail.com> <DU0P190MB19780FBDDC160A81F45D44E7FD299@DU0P190MB1978.EURP190.PROD.OUTLOOK.COM> <CAPt1N1=eivxubnO4cFAuuE5mVQ2s_zg7cih4FObFj3FeUfvTvw@mail.gmail.com> <DU0P190MB1978C7E00D441207BB8653ADFD289@DU0P190MB1978.EURP190.PROD.OUTLOOK.COM>
In-Reply-To: <DU0P190MB1978C7E00D441207BB8653ADFD289@DU0P190MB1978.EURP190.PROD.OUTLOOK.COM>
From: Ted Lemon <mellon@fugue.com>
Date: Tue, 18 Oct 2022 09:36:25 -0400
Message-ID: <CAPt1N1mU-1Wg=Fbbxp=9wp2sbWGAgnqZS3=uiJ4NpE=z1OsKUw@mail.gmail.com>
To: Esko Dijk <esko.dijk@iotconsultancy.nl>
Cc: Jonathan Hui <jonhui=40google.com@dmarc.ietf.org>, dnssd <dnssd@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e0440705eb4f2f49"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnssd/u3lp_Gn23Kp5ub5_CDF7m-1XxNk>
Subject: Re: [dnssd] Short TTLs in Discovery Responses leads frequent cache refreshes for long-lived queries...
X-BeenThere: dnssd@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Discussion of extensions to DNS-based service discovery for routed networks." <dnssd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnssd>, <mailto:dnssd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnssd/>
List-Post: <mailto:dnssd@ietf.org>
List-Help: <mailto:dnssd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnssd>, <mailto:dnssd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2022 13:37:05 -0000

That sounds fine, but we have an existing practice of doing long-lived
queries using unicast DNS, so we can't simply pretend that nobody will ever
do it. The reason this came up is that I was seeing it in real life. Also,
in some cases a Do53 query may go through where a DoTLS query won't, so
having the Do53 fallback seems necessary.

In practice, e.g. in a stub network situation like with Thread, the client
will in fact be directly querying the Discovery Proxy, so if it uses the
pattern you described, it will never be penalized by an intermediate cache
holding stale data or failing to do a refresh. Even if there were an
intermediate cache, we would expect it to do DNS push for records in cache,
rather than doing Unicast DNS, so it would get immediate updates when
services were added or removed. We can't assume this in general, but I
think it's reasonable to assume in a specific case like Thread (and maybe
we should require it).

Also, the tradeoff we're discussing here is in regards to specific
implementations. What I'm trying to avoid in this particular case is a
fairly low-power device being slammed with queries because it sets a low
TTL. If the device is capable of handling a high query load, there's no
reason it can't set shorter TTLs. We see this a lot in practice with web
services, where small TTLs are quite typical. E.g.:

www.dradis.netflix.com. 31 IN CNAME
www.eu-west-1.internal.dradis.netflix.com.

(not to pick on Netflix, just an example).

The reason I brought this up is that at present RFC8766 says that the
unicast query TTL MUST be capped at 10s. This is not an unreasonable number
for a highly-performant server, but it seems too high to set as an absolute
requirement for all servers.

On Tue, Oct 18, 2022 at 8:55 AM Esko Dijk <esko.dijk@iotconsultancy.nl>
wrote:

> Ok, I see how the TTL can be used to limit the load on servers and it
> implies that updates to service/host information takes longer to propagate
> to the client.
>
>
>
> It’s just surprising to learn that this limitation not only applies to the
> information that was found (and is cached) but also to other service
> information (answer) records that the client or its resolver-cache did not
> get yet. E.g. because these newer services were added to the server after
> the first query had been made.
>
> So, the TTL that was used for service #1 records defines how long service
> #2 records remain unfindable, even though service #1 and #2 may be
> unrelated.  Since DNS Push isn’t mentioned at all in RFC 6763; for unicast
> scenarios it means that TTL choice by the server is crucial to get the
> desired discovery speed for new services, that is fit for the use case. RFC
> 8766 gives some good considerations on this (that would have been useful
> also in RFC 6763 given that it doesn’t have the Push solution).
>
>
>
> What about the following thought: TTL 10 seconds per RFC 8766 is better
> than longer (1, 2 or 5 mins) TTL for discoverability of new
> services/devices. If a client wants a long-lived query, it uses Push or
> LLQ. If a client uses the one-shot unicast UDP/TCP query, it selects the
> needed service from the list returned and starts using the service for as
> long as it remains responsive – not sending any new DNS query for such
> service(s) while it’s using the service. Only if the service becomes
> unresponsive, the client needs to redo the DNS query for services. Because
> the TTL is low (10 sec), the client likely receives an up-to-date list of
> services and not ‘stale’ information.
>
> This assumes a client incapable to do DNS-push and never doing a non-stop
> long-lived query. So if at some point it cannot find any matching service,
> it stops using such service until a user performs some explicit interaction
> with the device e.g. a retry button.
>
>
>
> Esko
>
>
>
>
>
> *From:* Ted Lemon <mellon@fugue.com>
> *Sent:* Monday, October 17, 2022 18:20
> *To:* Esko Dijk <esko.dijk@iotconsultancy.nl>
> *Cc:* Jonathan Hui <jonhui=40google.com@dmarc.ietf.org>; dnssd <
> dnssd@ietf.org>
> *Subject:* Re: [dnssd] Short TTLs in Discovery Responses leads frequent
> cache refreshes for long-lived queries...
>
>
>
> Remember that DNS-SD is built on the DNS protocol. This isn't a limitation
> of the DNS protocol per sé. It's just that the DNS protocol tries to limit
> the load on authoritative servers by having caching servers. A Discovery
> Proxy is necessarily an authoritative server from the perspective of the
> DNS. If you have a caching server between you and the authoritative server,
> then TTLs matter, not because they prevent you from repeating queries, but
> because they limit the utility of repeating queries. However, you can of
> course always just query the authoritative server directly, and this should
> always give you current data. Again in the case of a Discovery Proxy,
> "current data" is still going to be coming from a cache, but if new
> information has been advertised, or if a service was removed cleanly, it
> should be the case that it's no longer in the cache. This is a best effort
> process, though—since mDNS records have TTLs of 75 minutes (typically), if
> the service doesn't successfully transmit a "goodbye" packet to the
> Discovery Proxy's cache, then the information will still be in that cache.
>
>
>
> If you start an mDNS query (as may happen when the Discovery Proxy
> receives a new DNS request), then there will be periodic attempts to
> discover new services. At the same time, when a new service shows up, it
> has to announce itself, and that should land it in the mDNS cache even if
> no query has been made, but also if there's an ongoing query. So in
> practice, a direct query to the Discovery Proxy is likely to produce all of
> the relevant current information if it fits in a response. But this is not
> guaranteed.
>
>
>
> Anyway, as you said earlier, DNS Push is always preferable. This is a
> question about what to do when for whatever reason it's not being used.
>
>
>
> On Mon, Oct 17, 2022 at 11:13 AM Esko Dijk <esko.dijk@iotconsultancy.nl>
> wrote:
>
> So is this a limitation of unicast DNS-SD in general?  Given that RFC 6763
> doesn’t mention Push / LLQ operation, I had assumed unicast DNS would work
> in principle.
>
>
>
> But if for example I do a QTYPE=PTR query to find services of a particular
> type, and there is one answer with a TTL of one day.  And then for the rest
> of the day I can’t query any other services anymore, because the first
> found service is served from cache for the entire day?
>
> If the local resolver/cache is aware of how DNS-SD works it really
> shouldn’t behave like this as the assumption is that services may come and
> go (see RFC 6763).  But probably if the resolver/cache is a regular DNS
> one, not aware of DNS-SD, it would work like that.
>
>
>
> Esko
>
>
>
> *From:* Ted Lemon <mellon@fugue.com>
> *Sent:* Monday, October 17, 2022 16:19
> *To:* Esko Dijk <esko.dijk@iotconsultancy.nl>
> *Cc:* Jonathan Hui <jonhui=40google.com@dmarc.ietf.org>; dnssd <
> dnssd@ietf.org>
> *Subject:* Re: [dnssd] Short TTLs in Discovery Responses leads frequent
> cache refreshes for long-lived queries...
>
>
>
> The issue is that the thing doing the query is probably not the client,
> but rather the resolver. The resolver might cache the entry and not inquire
> further until the TTL expires. If the client asks twice, the second answer
> comes out of the cache, not from a second query. So we need to set the TTL
> in such a way that we don’t create an unreasonable query load, but also
> don’t cause unreasonable delays when the data changes.
>
>
>
> Op ma 17 okt. 2022 om 08:51 schreef Esko Dijk <esko.dijk@iotconsultancy.nl
> >
>
> > My first reaction is that long-lived queries should happen over DNS
> Push, so this really shouldn't be an issue.
>
>
>
> Similar thought as I had: if the client is a “traditional” Unicast DNS
> client it asks its question, the server answers, and the server doesn’t
> have to refresh anything.  And if the client uses LLQ or DNS Push then the
> TTL’s aren’t modified.
>
> So the problem may be rather that the Unicast DNS client sees the small
> TTL (10 seconds) and redoes the query every few seconds?
>
>
>
> > That said, I think it makes sense to increase the TTL. The larger TLL
> might encourage using DNS Push if there was a desire to notice changes more
> quickly.
>
>
>
> Agree, it could be e.g. 60 or 120 seconds for the TTL of a positive
> answer. I’m guessing the remote client seeking access to the (renumbered)
> service may in any case redo a DNS query well before the TTL has expired,
> to find an alternative service to replace the one it can’t connect to?
>
> No rule saying it has to wait until the TTL of its old service has
> expired. It can then find back again the original service but now with new
> (renumbered) IP address.
>
>
>
> Esko
>
>
>
>
>
> *From:* dnssd <dnssd-bounces@ietf.org> *On Behalf Of *Jonathan Hui
> *Sent:* Thursday, October 13, 2022 23:36
> *To:* Ted Lemon <mellon@fugue.com>
> *Cc:* dnssd <dnssd@ietf.org>
> *Subject:* Re: [dnssd] Short TTLs in Discovery Responses leads frequent
> cache refreshes for long-lived queries...
>
>
>
> My first reaction is that long-lived queries should happen over DNS Push,
> so this really shouldn't be an issue.
>
>
>
> That said, I think it makes sense to increase the TTL. The larger TLL
> might encourage using DNS Push if there was a desire to notice changes more
> quickly.
>
>
> --
>
> Jonathan Hui
>
>
>
>
>
>
>
> On Thu, Oct 13, 2022 at 12:11 PM Ted Lemon <mellon@fugue.com> wrote:
>
> I've run into a problem with the discovery proxy specification. In section
> 5.5.1, we are instructed that for non-DNS-Push responses, we should limit
> the TTL of the response to ten seconds. But this is problematic: this means
> that if our DNSSD server is asked to do a long-lived query on a particular
> name, and it gets back a response, it's going to have to refresh that
> response every few seconds to keep it alive. Otherwise the TTL will expire
> and it'll be removed from the cache.
>
>
>
> This entire section optimizes the TTL for best response times to changes,
> but usually things don't change. Of course, when they do change, we'd like
> to know about it as soon as possible, but maybe in that case we should be
> using DNS Push and not DNS-over-UDP.
>
>
>
> The reason this comes up is that I'm actually seeing this behavior in the
> wild: I have a long-lived query running to track the existence of SRP
> replication servers, and when this is done using DNS as opposed to mDNS,
> and DNS Push isn't working (which is a bug—it should be), I get a query
> every random interval less than ten seconds. This seems really bad—a big
> routine expenditure to speed things up in an uncommon case.
>
>
>
> I'm feeling like this section needs to be tweaked a bit. Maybe we need to
> set a longer TTL, but not too long. Like a minute or five minutes. That
> would reduce the query load substantially. Or maybe we don't care—maybe
> this is an okay query load. It's unicast, after all—what's a packet every
> few seconds?
>
>
>
> But anyway, that's the question. Any theories?
>
> _______________________________________________
> dnssd mailing list
> dnssd@ietf.org
> https://www.ietf.org/mailman/listinfo/dnssd
>
>