Re: Packet Number Encryption Performance

Ian Swett <ianswett@google.com> Fri, 22 June 2018 11:45 UTC

Return-Path: <ianswett@google.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B82FC130E42 for <quic@ietfa.amsl.com>; Fri, 22 Jun 2018 04:45:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.52
X-Spam-Level:
X-Spam-Status: No, score=-15.52 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DC_PNG_UNO_LARGO=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6CVxEX0ITNXB for <quic@ietfa.amsl.com>; Fri, 22 Jun 2018 04:45:33 -0700 (PDT)
Received: from mail-yb0-x232.google.com (mail-yb0-x232.google.com [IPv6:2607:f8b0:4002:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F2A8B130E3D for <quic@ietf.org>; Fri, 22 Jun 2018 04:45:32 -0700 (PDT)
Received: by mail-yb0-x232.google.com with SMTP id n23-v6so2431145ybg.1 for <quic@ietf.org>; Fri, 22 Jun 2018 04:45:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lzBpbuHtXPHdepr1/4yhD6deVWsxH29iAa+har7xM60=; b=ekmpvwg3JfD2OYSwZk+L3xlMjFLMVlDS/1vTYy8OB12CR92sFVI+xPh/yYMqeID+GW FdNi5UupdFbyhYX64HI/VOj0PoYXFgTsyaiKdY6Pp2tANXQ+QQfgZi9nuVybr6hdAdBr Z7qYlGoz04l3x6tY5PCa6f3HwxXkQYxIOKAj3zeZHiF1jCLB/NBQMq7rq9SnOjBwGpGa 64rPcZKXsgj65uBhqaBD5ykNRFWEKp+7DkOCgd4QLL4bkDce+4hN+CxSdWJrMMh3e3+i WMo8vJxiubZc+WM/G4Q4ph/sL/9AwcGYAn8KlGZt3QGoXxtsEIJOJF30GUs6HFr4R8If N4WQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lzBpbuHtXPHdepr1/4yhD6deVWsxH29iAa+har7xM60=; b=Ai6s8q3+mZo55UQjtdmPyfSldZr1UxFSMxddu6eOCZ5mgPpV3CQBFXA6mSmulXTfuC 2zCbO0yKWfRS/Pc6Vyo0XzB8I1/m5i881+pCRRnmpxV5uNzbLge/FoczjTzS90HqB9Oe 9tGqY4DQfnrt41s81LHIK1BvdOZQ3g7C3YiKwlRqAGZeFoyVAJNffVydl8f3/qbechAz CfGWKRul3nNANeFszhQOgaUo5O7EPaxNvvIHawZNSZvCJ6f3wBAXSmxYS3IGhD2LqB/b c4onTj98KT27DNMDfHSsayOg+uV0iAzSYpN0SrlckqoYxM9v9CcCom3kMrTzdewYAkIr KvFw==
X-Gm-Message-State: APt69E0H8ScCGU7b7jfu8wV7NSF/LqjeKYGYMus9bdrsGAGk1DJ2QykH N0y0Ak0S1rpORvLICEhDOpXGyF07G4ki5PVoW8w8UQ==
X-Google-Smtp-Source: ADUXVKJ/5WFyLZzAM4JcmO03sfc4u2XqE3+bdh9EjqK16elCeeXGNYVco3MpU/V1RAOkdupeHNmIKnmeKwA7dQSV0I4=
X-Received: by 2002:a25:57d7:: with SMTP id l206-v6mr612765ybb.206.1529667931687; Fri, 22 Jun 2018 04:45:31 -0700 (PDT)
MIME-Version: 1.0
References: <DM5PR2101MB0901FCB1094A124818A0B1FEB3760@DM5PR2101MB0901.namprd21.prod.outlook.com> <CANatvzxVBq1-UKiuixWGFfFyWMh8SYpp=y2LqYwiF=tHT6oOOQ@mail.gmail.com> <DM5PR2101MB0901C834F1FDFEC6D0D50781B3750@DM5PR2101MB0901.namprd21.prod.outlook.com> <CANatvzz0u=oy1j2_6=bn6bcuwzQv_6fVqe3WkBtjwaAZ8Bfh=w@mail.gmail.com> <CANatvzysRVQXsB0ZCReY3n_R_kZT-jhmYwR-7-2KYt5+GZCk0A@mail.gmail.com>
In-Reply-To: <CANatvzysRVQXsB0ZCReY3n_R_kZT-jhmYwR-7-2KYt5+GZCk0A@mail.gmail.com>
From: Ian Swett <ianswett@google.com>
Date: Fri, 22 Jun 2018 07:45:20 -0400
Message-ID: <CAKcm_gPxYu9jNFmYR0_vQfawuC+T_E9UJbcDPOycrUAMuVJabg@mail.gmail.com>
Subject: Re: Packet Number Encryption Performance
To: Kazuho Oku <kazuhooku@gmail.com>
Cc: Nick Banks <nibanks@microsoft.com>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/related; boundary="000000000000b26752056f3991da"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/uE_p4UURUVjqzANLzH-OuYw91nU>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jun 2018 11:45:36 -0000

Thanks for digging into the details of this, Kazuho.  <4% increase in
crypto cost is a bit more than I originally expected(~2%), but crypto is
less than 10% of my CPU usage, so it's still less than 0.5% total, which is
acceptable to me.

On Fri, Jun 22, 2018 at 2:45 AM Kazuho Oku <kazuhooku@gmail.com> wrote:

>
>
> 2018-06-22 12:22 GMT+09:00 Kazuho Oku <kazuhooku@gmail.com>:
>
>>
>>
>> 2018-06-22 11:54 GMT+09:00 Nick Banks <nibanks@microsoft.com>:
>>
>>> Hi Kazuho,
>>>
>>>
>>>
>>> Thanks for sharing your numbers as well! I'm bit confused where you say
>>> you can reduce the 10% overhead to 2% to 4%. How do you plan on doing that?
>>>
>>
>> As stated in my previous mail, the 10% of overhead consists of three
>> parts, each consuming comparable number of CPU cycles. The two among the
>> three is related to the abstraction layer and how CTR is implemented, while
>> the other one is the core AES-ECB operation cost.
>>
>> It should be able to remove the costly abstraction layer.
>>
>> It should also be possible to remove the overhead of CTR, since in PNE,
>> we need to XOR at most 4 octets (applying XOR is the only difference
>> between CTR and ECB). That cost should be something that should be possible
>> to be nullified.
>>
>> Considering these aspects, and by looking at the numbers on the OpenSSL
>> source code (as well as considering the overhead of GCM), my expectation
>> goes to 2% to 4%.
>>
>
> Just did some experiments and it seems that the expectation was correct.
>
> The benchmarks tell me that the overhead goes down from 10.0% to 3.8%, by
> doing the following:
>
> * remove the overhead of CTR abstraction (i.e. use the ECB backend and do
> XOR by ourselves)
> * remove the overhead of the abstraction layer (i.e. call the method
> returned by EVP_CIPHER_meth_get_do_cipher instead of calling
> EVP_EncryptUpdate)
>
> Of course the changes are specific to OpenSSL, but I would expect that you
> can expect similar numbers assuming that you have access to an optimized
> AES implementation.
>
>
>>
>>
>>>
>>> Sent from my Windows 10 phone
>>>
>>> [HxS - 15254 - 16.0.10228.20075]
>>>
>>>
>>> ------------------------------
>>> *From:* Kazuho Oku <kazuhooku@gmail.com>
>>> *Sent:* Thursday, June 21, 2018 7:21:17 PM
>>> *To:* Nick Banks
>>> *Cc:* quic@ietf.org
>>> *Subject:* Re: Packet Number Encryption Performance
>>>
>>> Hi Nick,
>>>
>>> Thank you for bringing the numbers to the list.
>>>
>>> I have just run a small benchmark using Quicly, and I see comparable
>>> numbers.
>>>
>>> To be precise, I see 10.0% increase of CPU cycles when encrypting a
>>> Initial packet of 1,280 octets. I expect that we will see similar numbers
>>> on other QUIC stacks that also use picotls (with OpenSSL as a backend).
>>> Note that the number is only comparing the cost of encryption, the overhead
>>> ratio will be much smaller if we look at the total number of CPU cycles
>>> spent by a QUIC stack as a whole.
>>>
>>> Looking at the profile, the overhead consists of three operations that
>>> each consumes comparable CPU cycles: core AES operation (using AES-NI), CTR
>>> operation overhead, CTR initialization. Note that picotls at the moment
>>> provides access to CTR crypto beneath the AEAD interface, which is to be
>>> used by the QUIC stacks.
>>>
>>> I would assume that we can cut down the overhead to somewhere between 2%
>>> to 4%, but it might be hard to go down to somewhere near 1%, because we
>>> cannot parallelize the AES operation of PNE with that of AEAD (see
>>> https://github.com/openssl/openssl/blob/OpenSSL_1_1_0h/crypto/aes/asm/aesni-x86_64.pl#L24-L39
>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fblob%2FOpenSSL_1_1_0h%2Fcrypto%2Faes%2Fasm%2Faesni-x86_64.pl%23L24-L39&data=02%7C01%7Cnibanks%40microsoft.com%7C11d55f17333e4a795d7008d5d7e6d93c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636652308843994134&sdata=kqMz4SsN%2F2ErGK06Qz8Z0vUzpl4MiipnNE2wAMUb46c%3D&reserved=0>
>>> about the impact of parallelization).
>>>
>>> I do not think that 2% to 4% of additional overhead to the crypto is an
>>> issue for QUIC/HTTP, but current overhead of 10% is something that we might
>>> want to decrease. I am glad to be able to learn that now.
>>>
>>>
>>> 2018-06-22 5:48 GMT+09:00 Nick Banks <
>>> nibanks=40microsoft.com@dmarc.ietf.org>:
>>>
>>>> Hello QUIC WG,
>>>>
>>>>
>>>>
>>>> I recently implemented PNE for WinQuic (using bcrypt APIs) and I
>>>> decided to get some performance numbers to see what the overhead of PNE
>>>> was. I figured the rest of the WG might be interested.
>>>>
>>>>
>>>>
>>>> My test just encrypts the same buffer (size dependent on the test case)
>>>> 10,000,000 times and measured the time it took. The test then did the same
>>>> thing, but also encrypted the packet number as well. I ran all that 10
>>>> times in total. I then collected the best times for each category to
>>>> produce the following graphs and tables (full excel doc attached):
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Time (ms)*
>>>>
>>>> *Rate (Mbps)*
>>>>
>>>> *Bytes*
>>>>
>>>> *NO PNE*
>>>>
>>>> *PNE*
>>>>
>>>> *PNE Overhead*
>>>>
>>>> *No PNE*
>>>>
>>>> *PNE*
>>>>
>>>> *4*
>>>>
>>>> 2284.671
>>>>
>>>> 3027.657
>>>>
>>>> 33%
>>>>
>>>> 140.064
>>>>
>>>> 105.692
>>>>
>>>> *16*
>>>>
>>>> 2102.402
>>>>
>>>> 2828.204
>>>>
>>>> 35%
>>>>
>>>> 608.827
>>>>
>>>> 452.584
>>>>
>>>> *64*
>>>>
>>>> 2198.883
>>>>
>>>> 2907.577
>>>>
>>>> 32%
>>>>
>>>> 2328.45
>>>>
>>>> 1760.92
>>>>
>>>> *256*
>>>>
>>>> 2758.3
>>>>
>>>> 3490.28
>>>>
>>>> 27%
>>>>
>>>> 7424.86
>>>>
>>>> 5867.72
>>>>
>>>> *600*
>>>>
>>>> 4669.283
>>>>
>>>> 5424.539
>>>>
>>>> 16%
>>>>
>>>> 10280
>>>>
>>>> 8848.68
>>>>
>>>> *1000*
>>>>
>>>> 6130.139
>>>>
>>>> 6907.805
>>>>
>>>> 13%
>>>>
>>>> 13050.3
>>>>
>>>> 11581.1
>>>>
>>>> *1200*
>>>>
>>>> 6458.679
>>>>
>>>> 7229.672
>>>>
>>>> 12%
>>>>
>>>> 14863.7
>>>>
>>>> 13278.6
>>>>
>>>> *1450*
>>>>
>>>> 7876.312
>>>>
>>>> 8670.16
>>>>
>>>> 10%
>>>>
>>>> 14727.7
>>>>
>>>> 13379.2
>>>>
>>>>
>>>>
>>>> I used a server grade lab machine I had at my disposal, running the
>>>> latest Windows 10 Server DataCenter build. Again, these numbers are for
>>>> crypto only. No QUIC or UDP is included.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> - Nick
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Kazuho Oku
>>>
>>
>>
>>
>> --
>> Kazuho Oku
>>
>
>
>
> --
> Kazuho Oku
>