Re: HTTP Partial POST Replay

Alan Frindell <afrind@fb.com> Mon, 01 July 2019 17:22 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DB59F120802 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 1 Jul 2019 10:22:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.75
X-Spam-Level:
X-Spam-Status: No, score=-2.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fb.com header.b=pMd0vRr2; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.b=XLXXwatQ
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SMVlxBYhTcxY for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 1 Jul 2019 10:22:23 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E98F1207FE for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 1 Jul 2019 10:22:23 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1hhzy9-0003al-Lk for ietf-http-wg-dist@listhub.w3.org; Mon, 01 Jul 2019 17:19:57 +0000
Resent-Date: Mon, 01 Jul 2019 17:19:57 +0000
Resent-Message-Id: <E1hhzy9-0003al-Lk@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <prvs=2085d59ec2=afrind@fb.com>) id 1hhzy7-0003a0-Rq for ietf-http-wg@listhub.w3.org; Mon, 01 Jul 2019 17:19:55 +0000
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <prvs=2085d59ec2=afrind@fb.com>) id 1hhzy5-0001OE-Aa for ietf-http-wg@w3.org; Mon, 01 Jul 2019 17:19:55 +0000
Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x61HAjCb003765; Mon, 1 Jul 2019 10:19:30 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=facebook; bh=eAu93wKHEy/3krp18V1rEEx9gQofedYFFd3Rhdler0U=; b=pMd0vRr2eKnsR9Kpa0FnyUSKdpQbnHkAlRDY4P3MuSDKGyin1jfsyOkRpn98fggzRqTN PN+2rVKDKhGvBhtQesLh8LK3AMDjHxADxJd5JpLsPesRGb+dmeQAB1TQt7A/Edk1Risl qYfLoMIQ1Y5UQQuLYxG82kpiQEDsBiQBmM4=
Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2tfhyah48q-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Mon, 01 Jul 2019 10:19:30 -0700
Received: from ash-exopmbx101.TheFacebook.com (2620:10d:c0a8:82::b) by ash-exhub203.TheFacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Mon, 1 Jul 2019 10:19:29 -0700
Received: from ash-exhub204.TheFacebook.com (2620:10d:c0a8:83::4) by ash-exopmbx101.TheFacebook.com (2620:10d:c0a8:82::b) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Mon, 1 Jul 2019 10:19:28 -0700
Received: from NAM05-BY2-obe.outbound.protection.outlook.com (100.104.31.183) by o365-in.thefacebook.com (100.104.36.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Mon, 1 Jul 2019 10:19:28 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eAu93wKHEy/3krp18V1rEEx9gQofedYFFd3Rhdler0U=; b=XLXXwatQg/TnEvCdZ0s+eLi0DAH7sd633iKUE6OmHyAuNrZNecffGsylvzOUDeyhlhIczxyuYpfUDD/elGCjhssxqCLa207DJri1UQOc5jaOOTlxCTqrb7lbti0oc7QBymcU5hHhwEAgawaLCxwiUVw0f4pKSm+CcDKVi9nAQEY=
Received: from MWHPR15MB1181.namprd15.prod.outlook.com (10.175.9.8) by MWHPR15MB1167.namprd15.prod.outlook.com (10.175.3.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.20; Mon, 1 Jul 2019 17:19:27 +0000
Received: from MWHPR15MB1181.namprd15.prod.outlook.com ([fe80::e9e3:7a6d:bb46:4e15]) by MWHPR15MB1181.namprd15.prod.outlook.com ([fe80::e9e3:7a6d:bb46:4e15%6]) with mapi id 15.20.2008.020; Mon, 1 Jul 2019 17:19:27 +0000
From: Alan Frindell <afrind@fb.com>
To: Matthew Stock <stock@csgeeks.org>
CC: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Thread-Topic: HTTP Partial POST Replay
Thread-Index: AQHVLdu2Q7k3G2uA3Eid3uyhFbvcFaayy8OAgAKzLQCAAHn6AP//mEkA
Date: Mon, 01 Jul 2019 17:19:27 +0000
Message-ID: <B0002A63-813E-42D3-8455-97055C3DAE43@fb.com>
References: <BCDF2644-1D6A-40FF-9AF7-7FA26A57E3A9@fb.com> <948e26e8-ce3d-894a-17f6-fb17194a37b8@treenet.co.nz> <5A755881-F0C0-4E68-AA21-98E423BD0B1C@fb.com> <CAJEih4_-wV1CwJPudoMdA1p=qrqAWuo9zTdfoH=RFJq8EK_fgg@mail.gmail.com>
In-Reply-To: <CAJEih4_-wV1CwJPudoMdA1p=qrqAWuo9zTdfoH=RFJq8EK_fgg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1a.0.190609
x-originating-ip: [2620:10d:c090:200::3:9295]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 30b7666e-f1b9-4c1d-aadb-08d6fe484557
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MWHPR15MB1167;
x-ms-traffictypediagnostic: MWHPR15MB1167:
x-ms-exchange-purlcount: 2
x-microsoft-antispam-prvs: <MWHPR15MB11678F4762B935600157AC96A7F90@MWHPR15MB1167.namprd15.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 00851CA28B
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(136003)(366004)(376002)(346002)(39860400002)(189003)(199004)(36756003)(3480700005)(2616005)(6512007)(486006)(11346002)(6306002)(54896002)(236005)(6436002)(53936002)(68736007)(86362001)(476003)(99286004)(14454004)(6916009)(6116002)(66946007)(7736002)(102836004)(73956011)(446003)(6486002)(71190400001)(71200400001)(229853002)(76176011)(478600001)(76116006)(14444005)(53546011)(6506007)(4326008)(25786009)(46003)(256004)(81166006)(8676002)(5660300002)(81156014)(8936002)(66476007)(66446008)(64756008)(66556008)(6246003)(2906002)(33656002)(186003)(316002)(58126008); DIR:OUT; SFP:1102; SCL:1; SRVR:MWHPR15MB1167; H:MWHPR15MB1181.namprd15.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: QGp/m1NVgma1/8WKUjMC8YR3iceTwYzQd4jEeffT0LOwH5W0t/SjM1PtzQiqKHepJZgLkCCUDrd4UBM1EYnDxUPRAbI06E+u3DMiigcyzjYq5ef//RlATqXDSenrChTvY1YMOJXnxC3+vWcLbWUqaQWq/RmcsFEodUIrSqLLJBl1Fdt/vg+71u/l/3apR0bDW/gyjKVV4YPepLWXVtssDE4hyXLW2wILX5pklxNjFjQHnyQk8C/DsPlzUF+sarRcaKO3ixBnPQ35jti8/Le3cIYotn8tPN4vi7uzpX1w+nP2TWTJfuBaQaDADc0kV90snHe1Cn9QwWFMoufnaLY82AeihfmZXmRZu4Ko7rI8QPqK1NaZhhqRlZ6RdM7KcYM++n2OQAWL8toQfQqolGGlZ4EXvbzzgHmc26qR80gubf0=
Content-Type: multipart/alternative; boundary="_000_B0002A63813E42D3845597055C3DAE43fbcom_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 30b7666e-f1b9-4c1d-aadb-08d6fe484557
X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jul 2019 17:19:27.1775 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: afrind@fb.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1167
X-OriginatorOrg: fb.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-07-01_10:, , signatures=0
X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907010202
X-FB-Internal: deliver
Received-SPF: pass client-ip=67.231.145.42; envelope-from=prvs=2085d59ec2=afrind@fb.com; helo=mx0a-00082601.pphosted.com
X-W3C-Hub-Spam-Status: No, score=-4.8
X-W3C-Hub-Spam-Report: AWL=-0.000, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1hhzy5-0001OE-Aa d5275b47a2c4f7748d1454c7a20e87e4
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP Partial POST Replay
Archived-At: <https://www.w3.org/mid/B0002A63-813E-42D3-8455-97055C3DAE43@fb.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/36740
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Thanks for your feedback, comments inline

> If I assume the use case as you describe, where the RTT between the proxy and the server is tiny, and that the transactions are small, doesn't that mean it would be better to just let any existing requests complete before shutting down?

I suppose I’m targeting something of a goldilocks request size – too large to wait (eg: it will take longer than 10-15 seconds to complete), but not so large that retransmitting it to the proxy and back to another server wastes too much bandwidth.  Many of these requests are hundreds of KBs to low tens of MBs, coming from slower clients.  The worst case scenario of course is a large request from a slow client that begins immediately before shutdown.  One can minimize the number of requests that are replayed by delaying the replay until ~1s remains before hard shutdown.

> It seems like a better strategy would be to eliminate new requests to the new instance (e.g. changing the routing at the proxy) and then wait until all existing (short) requests are completed.  There are some corner cases, but that seems more straightforward.

Yes, we do route new requests around a draining webserver, but we cannot wait more than 10-15s before shutting down the old instance.  This constraint comes from the fact that we have a very large number of webservers, and our continuous deployment schedule means we must be able to restart the entire fleet within X minutes.  We found that across our infra we were still failing enough slow requests that we needed to implement something to address it.

>  In the restart case, you could have the new instance on the server take over accepting new requests while the old instance stays active to resolve those that are open.

I agree that this is a good solution.  In our case, however, this is not feasible because the memory requirements of the webserver are such that we cannot run two instances on a single host.  I tried to address this in the Existing Solutions part of the draft.

Thanks

-Alan


On Mon, Jul 1, 2019 at 12:18 PM Alan Frindell <afrind@fb.com<mailto:afrind@fb.com>> wrote:
Thanks for taking the time to read and comment.  My responses are inline.

    Overall impression is that this is a overly complex and resource
    expensive replacement for the very simple 307 status mechanism. There is
    no harm in telling a client it has to retry its reuqest.

I disagree that there is no harm in asking the client to retry the request.  In our case, the intermediary (load balancing reverse proxy), is < 1 millisecond away from the server, but hundreds of milliseconds away from the client.  Further, the client may be charged for bandwidth, and the client may have already spent a fair amount of time/bandwidth transmitting the request.  I agree with you that for a generic intermediary or forward proxy, particularly one very close to the client, the feature doesn't make as much sense.  I can clarify that in the draft.

Also, I should mention in the draft that the server can 307 rather than abort requests it cannot complete at shutdown time.

    * The stated case of "server wants to shutdown" does not correlate well
    with the fact that to use this mechanism the server has to store and
    re-deliver all the initial request bandwidth back to the intermediary.

Again, because the server and intermediary are close together, replaying the partial request doesn't prevent the server from shutting down.  The type of server and size of a typical request can also make a difference: we use this primary for dynamic web traffic (as opposed to something like video upload).  These dynamic requests tend not to be huge, and the web server was already holding the entire POST body in memory before dispatching it to the handler.

    * That re-sending is where the bandwidth issue is coming from. The
    initial request uses N bytes to arrive from the client, if M of those
    are a) delivered to each of D servers and b) received back from the
    initial D-1 servers, and c) deliver to the second server. That makes a
    total bandwidth consumption of (N + (D-1)*M).
      Whereas with 307 only consumes (N + M).

It seems you are only counting proxy to server bandwidth.  Assuming the intermediary cannot buffer the body, and a 307 triggers the client to resend the entire request, that will consume 2*N + (N + M) or (3*N + M), compared to N + (N + (D-1)*M) or (2*N + (D-1)*M).

    Also keep in mind that even a blind intermediary just pushing data to a
    single server is handling twice the traffic that server does. That is
    the minimum best-case situation. With multiple servers and/or clients
    the difference increases rapidly to orders of magnitude more resource
    consumption. That is the existing situation, before this feature even
    starts to force more resource consumption.

We find that only a fraction of requests in progress at shutdown time are POST requests, and of those, an even smaller fraction have incomplete bodies say 1 second after shutdown is initiated.  The number of requests that get redirected in this manner from single server in our deployment (which spans hundreds of thousands of servers) is relatively small.  Though, we do find that a tiny number get quite unlucky during a web tier restart and get replayed a few times.

    * All this re-sending of data could delay the server shutdown an
    unreasonable amount of time. Turning what would be a few seconds into
    minutes or even hours. Depending on the intermediary load being
    reasonable is not a good idea.

I agree with you that in a completely generic and blind case, this is a problem.  In our deployment, we find this feature does not impact shutdown time of the webserver, which is 10-15 seconds.

    * Every millisecond of delay added by the re-receive and re-send of data
    makes it more likely the client will terminate early. If that happens
    all this time, bandwidth, memory, and CPU cycles spent are completely
    wasted.

This is precisely the reason I feel this is a valuable feature.  If the client must resend the entire request it's likely to take even longer.

    Consider the case of a system which is undergoing a DoS at the
    public-facing interface of the intermediary. Enacting this feature is a
    huge resource expenditure for an already highly loaded intermediary.

If a system is undergoing DoS, I expect it to be able to jettison less critical functions to save resources.  Such an intermediary could always convert a partial POST replay into a 307 or 500, and reset the upstream request.  I can add this to the security considerations.

    * Section 2.1 says "The server MUST have prior knowledge"

    Yet no mechanism(s) are even hinted at how a server may acquire such
    knowledge. Defining a specific negotiation signal would be far better
    for this and avoid a huge headache with implementations choosing
    different signals and mechanisms for negotiating that knowledge.

I agree.  I will elaborate in the draft on feature negotiation.  We set this up out of band by configuring the proxies and servers to use this feature.  H2 or H3 SETTINGS are appropriate to announce the support for the feature, but H1 would most likely need to add a header to each request indicating support.

    * The Echo- or Pseudo-Echo mechanism is very clunky. I believe it to be
    unlikely that any intermediary implementing this feature is unable to
    simply store the initial request headers for re-use as needed.

That is valuable feedback.  Our implementation stores the initial request headers, but I didn't know if that was a valid assumption, so I added the Echo- mechanism to the draft.  I can remove it entirely and make it a requirement that the request be stored to use the feature.

Thanks

-Alan