Re: [v6ops] An Update to Happy Eyeballs

David Schinazi <dschinazi@apple.com> Wed, 15 March 2017 02:54 UTC

Return-Path: <dschinazi@apple.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6D36F131862 for <v6ops@ietfa.amsl.com>; Tue, 14 Mar 2017 19:54:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.3
X-Spam-Level:
X-Spam-Status: No, score=-4.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VEhtE_zZxYmG for <v6ops@ietfa.amsl.com>; Tue, 14 Mar 2017 19:54:03 -0700 (PDT)
Received: from mail-in2.apple.com (mail-out2.apple.com [17.151.62.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 906F4131850 for <v6ops@ietf.org>; Tue, 14 Mar 2017 19:54:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; d=apple.com; s=mailout2048s; c=relaxed/simple; q=dns/txt; i=@apple.com; t=1489546443; h=From:Sender:Reply-To:Subject:Date:Message-id:To:Cc:MIME-version:Content-type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-reply-to:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=9MicmvQt9A1fTnnELSJ9FySilNSXYz7Z4C2UsKG/Qsc=; b=bHGJV6Nv/gSzcDkCYlMw8NwhdL0Kp3KyARnPrZd84shrikSkVJV25QURXCkQloQS W2n+0vbGJ1YitJ6iZadnWE2exMlbXXCVujKFnFwL79/8D4/0YD7qBRl6cYXVCncc yfxXY4K2I8nKIgeoDH/GZXRkGtcsosZ4bCeM5ZkcbU8qi87RZQn2LEg6xq/s+p18 q1BZ+lRsnvfWLP2KrxPaesiP9b2Ea5VtrokjmywzqqQfP5hhQiEcmbvnBXyMUhYv y5ytkBy+hr1RZs2hPn1NkML/Sncd9RZSXswtWrBwg6eVD7rd8XHFrbXbkMLNbKwS gNsNzAbldJcHyS1GoJoXEw==;
Received: from relay7.apple.com (relay7.apple.com [17.128.113.101]) by mail-in2.apple.com (Apple Secure Mail Relay) with SMTP id 2E.BD.30096.ACCA8C85; Tue, 14 Mar 2017 19:54:03 -0700 (PDT)
X-AuditID: 11973e11-0d9ff70000007590-c7-58c8accab8b1
Received: from kencur (kencur.apple.com [17.151.62.38]) by relay7.apple.com (Apple SCV relay) with SMTP id E7.83.10079.9CCA8C85; Tue, 14 Mar 2017 19:54:02 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_F8KG9fRFqYtBb8MnlxB0dw)"
Received: from [17.153.49.247] (unknown [17.153.49.247]) by kencur.apple.com (Oracle Communications Messaging Server 8.0.1.2.20170210 64bit (built Feb 10 2017)) with ESMTPSA id <0OMU005LT5E0CF90@kencur.apple.com> for v6ops@ietf.org; Tue, 14 Mar 2017 19:54:01 -0700 (PDT)
Sender: dschinazi@apple.com
From: David Schinazi <dschinazi@apple.com>
Date: Tue, 14 Mar 2017 19:54:00 -0700
References: <148899860042.20118.391380898590855642.idtracker@ietfa.amsl.com> <A609BABB-BDF2-4CCB-8452-F489C019748C@apple.com> <m1clvfj-0000FCC@stereo.hq.phicoh.net> <ABE752F6-895B-431C-9E94-E0CD2FDDB2E3@apple.com> <m1cmTQX-0000IcC@stereo.hq.phicoh.net> <92EEB875-288D-4CF9-B81F-3B5C8EA49F53@apple.com> <CAKC-DJjeUX1rRB_e99SGJS06RoFZ6E6A8Tpj0hPAvfS6+L+XWA@mail.gmail.com>
To: IPv6 Operations <v6ops@ietf.org>
In-reply-to: <CAKC-DJjeUX1rRB_e99SGJS06RoFZ6E6A8Tpj0hPAvfS6+L+XWA@mail.gmail.com>
Message-id: <BAEBBDCE-790E-43D7-BD2A-AE1BF9B81B34@apple.com>
X-Mailer: Apple Mail (2.3251)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsUi2FCYqnt6zYkIg45uWYvTx/YyOzB6LFny kymAMYrLJiU1J7MstUjfLoEr4+i/FqaCk/MZK5o6tjI3MLa0M3YxcnJICJhI7JoymR3EFhLY xyjReNwSJv782i+mLkYuoPgKRok1/7+BFfEKCEr8mHyPBcRmFgiTuLjnBytE0RQmiUlTXzGD JIQFpCW6LtwFSnBwsAloSRxYYwRiCgsYSPy8nwlisgioSjw8Ew/R+Y9J4uuHI2wgnSICKhJT ztwHszkFgiVez1rFDLHWRmL5504miNtkJT49/8kO0iwhsIFN4n/HLJYJjIKzkJw3C8l5ELaW xPdHrUBxDiBbXuLgeVmIsKbEs3uf2CFsbYkn7y6wLmBkW8UolJuYmaObmWekl1hQkJOql5yf u4kRFN7T7QR3MB5fZXWIUYCDUYmHd8On4xFCrIllxZW5hxilOViUxHk5F5+IEBJITyxJzU5N LUgtii8qzUktPsTIxMEp1cAon+y9JT7jUZNQIB9brf9BrX3r8nYby8yf/v0yc4hp9hJr9YI3 03Lj5p/oYRPduOrgf/37cTb3z0kueZe36cCuq9u4J+7njtTjmcfyLnGHjCUrx5eP509MvG9h ZZrqvvK434FrpeG/o2XNlR97Rsl9n1Hpv1vssNWWQ0mTt87paztjfuxSxaT7SizFGYmGWsxF xYkAYfB/OlACAAA=
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrBIsWRmVeSWpSXmKPExsUiON1OTffUmhMRBjc2ClqcPraX2YHRY8mS n0wBjFFcNimpOZllqUX6dglcGUf/tTAVnJzPWNHUsZW5gbGlnbGLkZNDQsBE4vm1X0xdjFwc QgIrGCXW/P/GDpLgFRCU+DH5HguIzSwQJnFxzw9WiKIpTBKTpr5iBkkIC0hLdF24C5Tg4GAT 0JI4sMYIxBQWMJD4eT8TxGQRUJV4eCYeovMfk8TXD0fYQDpFBFQkppy5D2ZzCgRLvJ61ihli rY3E8s+dTBC3yUp8ev6TfQIj3ywkF81CchGErSXx/VErUJwDyJaXOHheFiKsKfHs3id2CFtb 4sm7C6wLGNlWMQoUpeYkVprrJRYU5KTqJefnbmIEBWRDYeoOxsblVocYBTgYlXh4N3w6HiHE mlhWXJl7iFGCg1lJhPfOwhMRQrwpiZVVqUX58UWlOanFhxgnMgI9OZFZSjQ5HxgveSXxhiYm BibGxmbGxuYm5rQUVhLn1Z51OEJIID2xJDU7NbUgtQjmKCYOTqkGxt0sEbO1BN5Y/jFytjw6 m2NzzPpvkxPe3mBmVtYQ2y/Qdd7wiU1d8LPDLRw6Dpq5ButbnDzVvfpF2UyE/Wd0vRT+/ezy 6ckHQhYJL9IJ4p/7bEE4b8+y7KOGvpHzs70ZDKqi1ZcVHczkLDy+zmPSWbukO28f91e7HAy6 +FJwV/NftXeefAblSizFGYmGWsxFxYkAMEfvubsCAAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/ROMDNI_Rhc3u5kut-cn5fB8ETUc>
Subject: Re: [v6ops] An Update to Happy Eyeballs
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Mar 2017 02:54:05 -0000

Mark,

We have operational data proving that asynchronous DNS is not "for no good reason".
DNS queries fail rarely (it still happens) but they often can take several hundreds of milliseconds longer.
The user shouldn't have to wait that delay for their webpage to load, and our data shows that this timer
fires regularly in the field indicating that users would have been needlessly waiting for their content.
Why should we artificially delay the user's experience for an A response if we already have the AAAA?

Erik,

Thanks for the suggestions!
- we'll add a section about problems that Happy Eyeballs do not solve
- regarding the increased retry timer, this is somewhat covered by historical RTT information.
    I don't think failures to reach the first address should harm the next ones, as that defeats redundancy
- I do agree that the main downside of Happy Eyeballs is hiding failures,
    but enforcing clients have a reporting system seems uncommon for IETF protocols, do you know of any?
- Regarding TFO / TLS 1.3 0RTT I think the requirement is that you MUST NOT use those
    if your data isn't idempotent, as packets could be duplicated in the network. This is orthogonal to Happy Eyeballs.

Thanks,
David Schinazi


> On Mar 13, 2017, at 19:02, Erik Nygren <erik+ietf@nygren.org> wrote:
> 
> It's great to see an updated version of this guidance!
> 
> > There is zero reason for making async DNS a MUST.  
> 
> The bare minimum is that revolvers must do A and AAAA lookups in parallel.
> The current behavior of some stacks is to do the AAAA and A lookups in series. 
> This means effectively means that adding IPv6 connectivity to a client adds an 
> extra RTT in for almost all DNS lookups.  
> For example, see section 5 in:  https://www.akamai.com/us/en/multimedia/documents/technical-publication/a-case-for-faster-mobile-web-in-cellular-ipv6-networks.pdf <https://www.akamai.com/us/en/multimedia/documents/technical-publication/a-case-for-faster-mobile-web-in-cellular-ipv6-networks.pdf>
> For clients such as mobile device visiting sites with lots of hostnames,
> this can have a very substantial performance hit.  This also shows up
> in some RUM-based measurement reports making IPv6 look slower 
> (due to clients with IPv6 spending more time doing DNS lookups before 
> doing page loads). 
> 
> Doing the lookups in parallel but not waiting for both responses is better than serial,
> but still has a perf hit (whatever the recovery time plus an RTT) when 
> the A or AAAA lookup packet is lost.
> 
> Some additional comments/thoughts after reading through the -01 version:
> 
> * It would be good to add a section on failure cases NOT detected/mitigated by this form of Happy Eyeballs.
>    Even if not covering mitigations, it would still be good to discuss them for awareness.
> 
>     - In particular, PMTUD seems to be the most common.  ie, if there is a PMTUD issue between
>       the client and server, then the connect will often succeed but the connection will fail to function.
>       I think most large-scale IPv6 server operators (plus many of small ones) have broken this at least once.
>       One client-side mitigation for TCP might be for the client to offer progressively smaller MSS 
>       as it retries different IPs within a protocol family.
>       (I don't know if anyone does or has tried this?  There is the server-side pmtud probing feature.)
>       For UDP protocols, using full-frame packets for the SYN and the SYN-ACK (as QUIC does) seems
>       like one approach to at least detect breakage early, although QUIC doesn't key have a PMTUD 
>       mitigation solution AFAIK.
> 
>     -  Servers that return different content for IPv6 vs IPv4 (eg, "404 not found" due to an unconfigured server on the IPv6 side).
>        "Don't do this" as advice to server operators is likely the best way to fix it as hacking around it on the client side is unhelpful.
> 
> * It may make sense to recommend some form of back-off in the retry timing.  Rather than a fixed value (eg, 250ms), adding
>   an increasing time value with some jitter into each retry may be safer in the cases of overloaded servers
>   or a network connection that is borderline near the retry time.  I've seen congestive failure and lack-of-progress
>   scenarios from having a fixed retry timer.  For example, with servers that do FIFO queueing of connections to accept,
>   if the queue becomes longer than the retry period then all clients fail to make forward progress and you reach 
>   congestive collapse.  The same can happen with links that become high-latency, however.
> 
> * It may be worth adding some guidance into reporting and visibility.  I'm not sure how?  
>   One of the big complaints against Happy Eyeballs is that it masks brokenness (latency spikes
>   but things keep working so no one complains enough to fix the root cause).
>   Having a recommendation that stacks or applications at least keep counters and telemetry
>   on failures may at least make it more viable to debug?
> 
> * It would be good to provide guidance or a reminder around protocols that send data along with a SYN
>   or an initial flight  (eg, TCP Fast Open / TFO and TLS 1.3 0RTT).  In particular, you likely
>   want to send this ONLY on one connection attempt (eg, the IPv6 attempt?) as otherwise 
>   the operation may be executed twice by the server.  This may cause undue server load
>   and for apps/clients incorrectly using TFO or 0RTT for non-idempotent operations 
>   it could cause duplicate actions. 
> 
> Thanks!  Erik
> 
> 
> 
> 
> 
> 
> On Sun, Mar 12, 2017 at 11:53 PM, David Schinazi <dschinazi@apple.com <mailto:dschinazi@apple.com>> wrote:
> Hi everyone,
> 
> Thanks a lot for the comments and feedback.
> We've incorporated them into -01, please let us know if they were properly addressed.
> https://www.ietf.org/internet-drafts/draft-pauly-v6ops-happy-eyeballs-update-01.txt <https://www.ietf.org/internet-drafts/draft-pauly-v6ops-happy-eyeballs-update-01.txt>
> 
> Regards,
> David Schinazi
> 
> 
>> On Mar 10, 2017, at 14:54, Philip Homburg <pch-v6ops-6@u-1.phicoh.com <mailto:pch-v6ops-6@u-1.phicoh.com>> wrote:
>> 
>> In your letter dated Fri, 10 Mar 2017 09:29:55 -0800 you wrote:
>>> We can certainly soften some of the language to make it clear that if 
>>> your system has no such option, you are not necessarily out of spec, but if su
>>> ch an option is available, we believe that it SHOULD indeed be used. This fits
>>> with the Happy Eyeballs paradigm: if I am waiting for one of the DNS response
>>> s to come back, I could have already made my connection in that time, getting 
>>> the user the resource loaded more quickly.
>> 
>> If the DNS requirements can be toned down to the point that an application
>> can use getaddrinfo if that fits the application, then that's fine
>> with me.
>> 
>> 
>> _______________________________________________
>> v6ops mailing list
>> v6ops@ietf.org <mailto:v6ops@ietf.org>
>> https://www.ietf.org/mailman/listinfo/v6ops <https://www.ietf.org/mailman/listinfo/v6ops>
> 
> 
> _______________________________________________
> v6ops mailing list
> v6ops@ietf.org <mailto:v6ops@ietf.org>
> https://www.ietf.org/mailman/listinfo/v6ops <https://www.ietf.org/mailman/listinfo/v6ops>
> 
>