RE: UDP send costs in Linux

Praveen Balasubramanian <pravb@microsoft.com> Wed, 04 April 2018 18:39 UTC

Return-Path: <pravb@microsoft.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id CC42C12DA49 for <quic@ietfa.amsl.com>; Wed, 4 Apr 2018 11:39:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1522867184; bh=khgfVSEJRM8pIHAfbETpjTGTb1J8T7lJ2j2MquPDM9I=; h=From:CC:Subject:Date:References:In-Reply-To:To:To:To; b=wvh8vwIJK4OiAxabyG8VKxIvMg+JPpNsGTRfMmhfvPmp5BOVB4JF1+XHzZCNs2nba Y3+xi8ZtO2vNRvo/PzIH+lS0R1qcjlbw246Lzt4yCBYFWhogkby60Y/gznh9n32foo rnrY06MSy31StMDDhXzzxdQbH1JAgUi5p2SRBnpk=
X-Mailbox-Line: From pravb@microsoft.com Wed Apr 4 11:39:44 2018
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 8175512D7F2; Wed, 4 Apr 2018 11:39:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1522867184; bh=khgfVSEJRM8pIHAfbETpjTGTb1J8T7lJ2j2MquPDM9I=; h=From:CC:Subject:Date:References:In-Reply-To:To:To:To; b=wvh8vwIJK4OiAxabyG8VKxIvMg+JPpNsGTRfMmhfvPmp5BOVB4JF1+XHzZCNs2nba Y3+xi8ZtO2vNRvo/PzIH+lS0R1qcjlbw246Lzt4yCBYFWhogkby60Y/gznh9n32foo rnrY06MSy31StMDDhXzzxdQbH1JAgUi5p2SRBnpk=
X-Original-To: dmarc-reverse@ietfa.amsl.com
Delivered-To: dmarc-reverse@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48C9612D7F8 for <dmarc-reverse@ietfa.amsl.com>; Wed, 4 Apr 2018 11:39:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.011
X-Spam-Level:
X-Spam-Status: No, score=-0.011 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XfZCy-eSNptg for <dmarc-reverse@ietfa.amsl.com>; Wed, 4 Apr 2018 11:39:41 -0700 (PDT)
Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on071c.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe41::71c]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 10B4C12D7F2 for <pravb=40microsoft.com@dmarc.ietf.org>; Wed, 4 Apr 2018 11:39:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=khgfVSEJRM8pIHAfbETpjTGTb1J8T7lJ2j2MquPDM9I=; b=mUurlC1eRs3SonTLzAgIlcyQQttKyGueW82VLrObLyoOnu5J8bdEsmfZ2NolX2/n5jysqpz4Wf9qkez25EeUdnzHku/0oDsXtTKjOoTIu7X4JZpU4bGNe3xrYVupESx1xfBYzymCYQ0oi6mpAdXB8G73UBmTLI70sEhK6/lXja4=
Received: from CY4PR21MB0630.namprd21.prod.outlook.com (10.175.115.20) by CY4PR21MB0775.namprd21.prod.outlook.com (10.173.192.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.675.2; Wed, 4 Apr 2018 18:39:36 +0000
Received: from CY4PR21MB0630.namprd21.prod.outlook.com ([fe80::de:ba33:4748:51da]) by CY4PR21MB0630.namprd21.prod.outlook.com ([fe80::de:ba33:4748:51da%6]) with mapi id 15.20.0675.003; Wed, 4 Apr 2018 18:39:36 +0000
From: Praveen Balasubramanian <pravb@microsoft.com>
CC: Subodh Iyengar <subodh@fb.com>, Ian Swett <ianswett@google.com>, Phillip Hallam-Baker <phill@hallambaker.com>, IETF QUIC WG <quic@ietf.org>
Subject: RE: UDP send costs in Linux
Thread-Topic: UDP send costs in Linux
Thread-Index: AQHTy6q+3JsicRIxhkabCq/gojpid6Pv6A6AgACtVICAACezAIAAD5iAgAAXAKCAAAmAAIAAAeWAgAABOKA=
Date: Wed, 4 Apr 2018 18:39:36 +0000
Message-ID: <CY4PR21MB0630F5F8DD480E168A8F97CDB6A40@CY4PR21MB0630.namprd21.prod.outlook.com>
References: <CAKcm_gP4zz1bW5T-_N2Oxy6o5Sw2mEs3DFU9_HrmfkuaJyLz0A@mail.gmail.com> <MWHPR15MB18215781CA00A71CF1AD0137B6A40@MWHPR15MB1821.namprd15.prod.outlook.com> <CAKcm_gO1BdLJOfyQeURWU0jmJo7q9Zft4U9fu1o9py9Bys5NeA@mail.gmail.com> <CAN1APde+YNN0QT0=CDN30qYr6PUinv96zjd10jAYU9-onL5Q9Q@mail.gmail.com> <CAMm+Lwi7XLWECXKhND7gK2JPZUTySu+ZFVXAChMWTgP2XN87BQ@mail.gmail.com> <CY4PR21MB06302D6BC1865A5889759548B6A40@CY4PR21MB0630.namprd21.prod.outlook.com> <CAJGwveDrW=hg+8ayxu2jSsO3cL4de_fpPeOq27AQNfr2aqBM5Q@mail.gmail.com> <CAN1APdfVyVkNz9PZjUDjL6qHwox1cSKxrXYnUxjpjuAzN3U8Gg@mail.gmail.com>
In-Reply-To: <CAN1APdfVyVkNz9PZjUDjL6qHwox1cSKxrXYnUxjpjuAzN3U8Gg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [2001:4898:80e8:a::712]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; CY4PR21MB0775; 7:b9N1+wxcssvoo8jvI9hE/ziY1xsQE5XfT6oQoXJC3XwA7ndrndn+pgdqmyVDSP34wfSByAnWwRUu587QxDb/pMYdPsRPDiBrB0kd33NJeD8pVv5pyfWswXYBCtKi2Ks9oG3/HGV4fdD3u56GnnBPhMeq7/tQ16Reo7s80hyiWX76yziBNuh0zX7qnI+3dxAUeN8DXkSESBQacqzXQfl2QsEoS9Sn2T0uZTArAqbcSFL+9wD8EpuRoqXMgzxDACdG; 20:OWYZPB2t9v6REO+cPojVv36KUlA4WD8986qdtFmsmYhCA5EInIglg6kQ16hgFv2PcOT5c1usfp1dcNr8zbu3qHULOmX9Q412aFofRgoMQUqbxP6LpgpIQruvBFXu1w91CB55ney1hVx4vi7Y1ItA/1M7AV6v4xvw/YiVKtyf11Y=
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-ms-office365-filtering-correlation-id: ad50b01f-7c4d-4896-bb5b-08d59a5b6a9d
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4604075)(3008032)(48565401081)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7193020); SRVR:CY4PR21MB0775;
x-ms-traffictypediagnostic: CY4PR21MB0775:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=pravb@microsoft.com;
x-microsoft-antispam-prvs: <CY4PR21MB0775427E6DAB485EC1B4B11BB6A40@CY4PR21MB0775.namprd21.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(28532068793085)(158342451672863)(192374486261705)(9452136761055)(189930954265078)(85827821059158)(67672495146484)(211936372134217)(153496737603132)(219752817060721);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(61425038)(6040522)(2401047)(5005006)(8121501046)(3231221)(944501327)(52105095)(10201501046)(3002001)(93006095)(93001095)(6055026)(61426038)(61427038)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(20161123564045)(20161123560045)(6072148)(201708071742011); SRVR:CY4PR21MB0775; BCL:0; PCL:0; RULEID:; SRVR:CY4PR21MB0775;
x-forefront-prvs: 0632519F33
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39380400002)(39860400002)(346002)(376002)(396003)(366004)(45534002)(199004)(189003)(6246003)(33656002)(46003)(478600001)(53546011)(229853002)(59450400001)(476003)(6506007)(7736002)(102836004)(10090500001)(5250100002)(68736007)(8990500004)(97736004)(446003)(4326008)(11346002)(74316002)(25786009)(105586002)(5660300001)(7696005)(76176011)(81166006)(3660700001)(14454004)(19609705001)(2900100001)(316002)(2906002)(86612001)(54906003)(93886005)(86362001)(10290500003)(81156014)(22452003)(8676002)(110136005)(606006)(3280700002)(9686003)(99286004)(106356001)(186003)(55016002)(8936002)(6116002)(53936002)(6306002)(236005)(54896002)(2171002)(6436002)(790700001)(486006)(39060400002)(966005)(733005); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR21MB0775; H:CY4PR21MB0630.namprd21.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: J2w1aE/VN1B7y+HmihCEF/TkkQichqJTfZoyos+HGQ8/6PliLltMI+Ei/OmuIiap1ozNym1k261REV3emS50gGcoBnvsoGawoixPV5uzgg9y3CbbC2LuNjb/cLCpov+BkGg3cPwTQBeMxqjruCIBQaMQ5kWhn7dOr8YTK7WO+rR+tk8t054V4WJNU+JjCHWrSPaqxCUWHOhYxCniMr7i/Qz8tmXoLB51GwzhU+kE7j/y5SFfdXo2rWWm0h4QEBfZZjZn4aCHZ4SZXebISCUriNx/9nvetfCHWZRrzmjKydA2KUbpAxnWddV5yKqwrbPbdWnovur3yIMib9upphAWX1XTUuAuv9rxJ/YjpLTEEZeMnwQ9yhpdqePt2ylDZGXWm3d/BEkQEWZyWbTvb00Pw8shJKtl/rCUxGc/COJRqqY=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_CY4PR21MB0630F5F8DD480E168A8F97CDB6A40CY4PR21MB0630namp_"
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
X-MS-Exchange-CrossTenant-Network-Message-Id: ad50b01f-7c4d-4896-bb5b-08d59a5b6a9d
X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Apr 2018 18:39:36.2835 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR21MB0775
To: =?utf-8?B?TWlra2VsIEZhaG7DuGUgSsO4cmdlbnNlbg==?= <mikkelfj@gmail.com>
To: Frederick Kautz <fkautz@alumni.cmu.edu>
To: Praveen Balasubramanian <pravb@microsoft.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/CGqpDe2NXWyBHo2_vLbbgJH6GdA>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Apr 2018 18:39:45 -0000

Hardware support is not orthogonal to DPDK and kernel bypass even user mode can take advantage of it. DPDK has a lot of other issues including no battle hardened TCPIP stacks. IMO we need to make progress on better performance regardless of user or kernel implementations. Especially for client devices user space implementations cannot be given direct hardware access for security reasons. For cloud VMs there is SR-IOV for direct hardware access which is now supported by more and more cloud providers.

From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Mikkel Fahnøe Jørgensen
Sent: Wednesday, April 4, 2018 11:32 AM
To: Frederick Kautz <fkautz@alumni.cmu.edu>;; Praveen Balasubramanian <pravb=40microsoft.com@dmarc.ietf.org>;
Cc: Subodh Iyengar <subodh@fb.com>;; Ian Swett <ianswett@google.com>;; Phillip Hallam-Baker <phill@hallambaker.com>;; IETF QUIC WG <quic@ietf.org>;
Subject: Re: UDP send costs in Linux

Got it before, but the reason why netmap and DPDK is important in contrast to kernel support is the very rich application interfaces that a custom user space QUIC implementation provides. Spending the next 10 years discussing epoll vs poll for QUIC is not going to cut it.

OS support that makes it easier to use netmap and friends concurrently with other network traffic would be welcome though.
Question is what hardware offload support can be given to a custom QUIC implementation, but one thing that comes to mind is pre-decryption on netmap where the OS first intercepts the packet and forwards the decrypted and DoS filtered packet to the app which then handles all the detailed framing, ACK, pacing etc.

Kind Regards,
Mikkel Fahnøe Jørgensen


On 4 April 2018 at 20.25.07, Frederick Kautz (fkautz@alumni.cmu.edu<mailto:fkautz@alumni.cmu.edu>) wrote:
I apologize to anyone here who gets a repeat of this message, the mailing list ate my earlier reply because my mail provider changed my default email address during a transition. Here is the original message:

I agree with this sentiment. If we see traction, then we should see better kernel support and offloading.

It should be trivial to get this working in something like VPP which can use DPDK to hw offload, entirely skipping the kernel in the data path.

There are also techniques to bypass the initial memory allocation in the kernel such as ebpf used by cilium and memif used by vpp.

My main concern at this point would be with cloud native environments with limited hw offloading support. E.g. if we run in AWS or GCE, better kernel support will probably be necessary.
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]


On Wed, Apr 4, 2018 at 10:57 AM, Praveen Balasubramanian <pravb=40microsoft.com@dmarc.ietf.org<mailto:pravb=40microsoft..com@dmarc.ietf.org>> wrote:
The need to worry is not on the client side – at least not immediately. 802.11ad, 802.11ax, and 5G LTE will bring Gigabit rates to client side so it is certainly going to become a problem longer term.

The current worry is on the server side. A large part of the work we do is performance optimization of the network stack. Google’s Sigcomm QUIC paper shows a 2x CPU increase going from TCP to QUIC. Most web services will NOT be able to make that trade off to get latency improvements for 90th percentile.

IMO improving UDP performance and hardware offloads are absolutely a necessity for QUIC to become widely adopted (not just by the biggest corporations with a large budget). We have work under way on both these fronts and I am very happy to see Linux also investing here in preparation for what’s coming. We need to do our best democratize this technology - multiple implementations and performance will both play a big role.


From: QUIC [mailto:quic-bounces@ietf.org<mailto:quic-bounces@ietf.org>] On Behalf Of Phillip Hallam-Baker
Sent: Wednesday, April 4, 2018 9:29 AM
To: Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>>
Cc: Subodh Iyengar <subodh@fb.com<mailto:subodh@fb.com>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>; Ian Swett <ianswett=40google.com@dmarc.ietf.org<mailto:40google.com@dmarc.ietf.org>>
Subject: Re: UDP send costs in Linux

I would not worry too much at this point.

The reason we want to be able to work at the application level is backwards compatibility. It has to be possible to deploy QUIC on any machine even without OS support or it won't be deployable.

It does not have to be performant on every platform. If people are using QUIC, whatever needs to be moved into the kernel for performance reasons will move there.




On Wed, Apr 4, 2018 at 11:32 AM, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>> wrote:
I have no data to add on the Linux UDP stack, but another issue is the lack of netmap support in cloud hosting environments.
I have not yet been working with this, but have looked into the problem and asked around.

netmap is default in FreeBSD and optional in Linux. But neither works efficiently without a hypervisor patch that is also available for netmap. With support for netmap, the user space application can send directly to the network adapter with very little overhead. There is also dpdk and some other interfaces that might be slightly faster but more vendor specific.

Assuming an application has access to optimized netmap, the only hurdle is address lookup, but if the application also manages that, or at least does the caching, there shouldn’t be much in the way of OS interference.

Of course, netmap blocks the entire network stack, so no PING or SSH. CloudFlare added a netmap patch so only some traffic would be routed fra the network interface to netmap, and netmap also supports efficient packet forwarding to the OS or other applications.

None of this works well in general, but for a cloud host that can be bootet automatically and destroyed rather than serviced, there is some opportunity.

but only if cloud service providers starts adding support their supported images and hypervisors. Not sure if any are working on this now.


Mikkel



On 4 April 2018 at 15.11.17, Ian Swett (ianswett=40google.com@dmarc.ietf..org<mailto:ianswett=40google.com@dmarc.ietf.org>) wrote:
I hope some of these patches will be available soon, but I'm not sure if soon is a month or 6.

On Tue, Apr 3, 2018 at 10:50 PM Subodh Iyengar <subodh@fb.com<mailto:subodh@fb.com>> wrote:

Thanks for sharing this Ian.



This definitely matches some of the observations we've seen as well in the UDP write path. Some of the other paths that we saw that added overhead was the route table lookup in linux udp stack. Connected UDP sockets did amortize that.



I'm looking forward to a smarter sendmmsg with GSO and zero copy. Is there any indication of the timeline for these patches to make it to linux? Would be happy to try any of these out to help iron out the API.



Subodh

________________________________
From: QUIC <quic-bounces@ietf.org<mailto:quic-bounces@ietf.org>> on behalf of Ian Swett <ianswett=40google.com@dmarc.ietf..org<mailto:ianswett=40google.com@dmarc.ietf..org>>
Sent: Tuesday, April 3, 2018 5:20:08 PM
To: IETF QUIC WG
Subject: UDP send costs in Linux

One challenge with QUIC at the moment is the increased CPU cost of sending UDP packets vs TCP payloads.  I've seen this across every platform Google has deployed QUIC on, so it's a widespread issue.

Here's an excellent presentation on what's causing the increased CPU consumption on Linux from Willem de Bruijn(UDP starts on slide 9).
http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf<https://na01.safelinks.protection.outlook..com/?url=http%3A%2F%2Fvger.kernel.org%2Fnetconf2017_files%2Frx_hardening_and_udp_gso.pdf&data=02%7C01%7Cpravb%40microsoft.com%7C103a6e81f3684c67382608d59a492c2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636584561449872935&sdata=FPe0mS5aG%2B88%2FSXvIVSO3yWnXh9tOhQfKkH6XnQDxvE%3D&reserved=0>

And while you're thinking of CPU usage, it's worth looking at the presentation on timing wheel based packet pacing(which is minimum release time based) and is ideal for QUIC(and TCP for that matter): https://conferences.sigcomm.org/sigcomm/2017/files/program/ts-9-4-carousel..pdf<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__conferences.sigcomm.org_sigcomm_2017_files_program_ts-2D9-2D4-2Dcarousel.pdf%26d%3DDwMFaQ%26c%3D5VD0RTtNlTh3ycd41b3MUw%26r%3Dh3Ju9EBS7mHtwg-wAyN7fQ%26m%3DtgghgvFkps7jYaFNdNyZBNFf0epVxFZbOGhhybFwPiE%26s%3D87gOGfz3S0lLbw8jy-lz3M9vPGChkmgtiJVzUxbMfvY%26e%3D&data=02%7C01%7Cpravb%40microsoft.com%7C103a6e81f3684c67382608d59a492c2e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636584561449872935&sdata=hSXXvTODvljI6niBLmMxGUFS9g%2FqOiHskUcNcgHtE6k%3D&reserved=0>

-Ian