[sbm] Re: Kernel API invariants and surprising behavior in TCP_NOTSENT_LOWAT
Jonathan Lennox <jonathan.lennox42@gmail.com> Thu, 03 April 2025 20:18 UTC
Return-Path: <jonathan.lennox42@gmail.com>
X-Original-To: sbm@mail2.ietf.org
Delivered-To: sbm@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id B1FEF17309E7 for <sbm@mail2.ietf.org>; Thu, 3 Apr 2025 13:18:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: 0.051
X-Spam-Level:
X-Spam-Status: No, score=0.051 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7MrOuGYvmikb for <sbm@mail2.ietf.org>; Thu, 3 Apr 2025 13:18:50 -0700 (PDT)
Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 4AD2017309E0 for <sbm@ietf.org>; Thu, 3 Apr 2025 13:18:50 -0700 (PDT)
Received: by mail-pj1-x1029.google.com with SMTP id 98e67ed59e1d1-303a66af07eso1032144a91.2 for <sbm@ietf.org>; Thu, 03 Apr 2025 13:18:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743711529; x=1744316329; darn=ietf.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=Fod2/uYsQEuY1RKzCcaWSlOMEi6FAHgx0BhVLoSj3qA=; b=GaRGJgngYHoK6FxbeUSNpFYLpr++baXvRI5344o5mlHSvrtBZJkyllDWnfhGWWl3u1 JWngl6L9bVfX6uVYhIwutU/geIclJS+ep7bRL5Jzp2GaO0T3OEW0EK/50CTGdzhvw1c7 8OlVdDg9cG4dYtslws134tN4QMrgtg7B4DIxW8PAQpuHEy5JmzdiELflS1z6oL5xPwkE yD+JhIIMOo+osY7bmyeFtC4bn8ir6Gp7QcFAoP6Cm0/sOQgkyZPXdOPO63yhkmj9Bs36 lJ2WOeBpmf+r11l1hdotPkWBWsrjYzuOeJ9MFlzOS45LGQacngE3AfRjGIXc6FRoz0Ru nqVw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743711529; x=1744316329; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Fod2/uYsQEuY1RKzCcaWSlOMEi6FAHgx0BhVLoSj3qA=; b=a8lBo+hmbLT+uvAX70PxdGGc83aMu8w22qkl4rZGIX1Ohig54QQl/vSjj/jdL9F0Lf NDYmjFSkd4g6iwSV1nVjxETSofNXjpq/eKcV48FMf33RPqEkvedy1R2IJSfEx5Aokbpq bY8Kw3ONjaMFHsjruhfoSrj7rk5sigrjpl4rNt3AszQZ5VxdOiiYW+0lAiGH4dh5Xabb bfNp3tVfPPzZenZFwwZA39D60g3tv79PTCSPCB9xJkUdjNOO2gnZox7ULTt/7/yzM78f RQ72JQeo1VjqK1tsE3aFPwizBoineZ5GCJUZbKOwWtfcNION/X+16gmPpT0OoJbTs5qe rYDg==
X-Gm-Message-State: AOJu0YyVbvt1ygBMY31HSKRhGr+N3lV5hjqfQ6BUxbsZFF56EPGmOO/s 3OLckelZK6iujcFysl9VbgIJ9OXxHpoP09Es10GufkMSFVItS2cOhW8iTvwlcBgLV4IJzMz+6gQ gews1+nDL+nB8UL4aNkiTQmpLl8bQVpjj
X-Gm-Gg: ASbGncs6p2ea83Bc7OCqbN2vn+QQp/2SVHeZVW5vu+eGl8aBso1JdTHAgzG2lrotTdw tRiIKwmo/e+PFElVN48N/5ov/DbdNZoGJNVrnJd5sCkLS/bm4gd0+JcsDdW9Ix9IQGlmAkORVhN COCiRLebRoy3xdP2Ui0T1WH1nLS8iS5bkQ96rZL0a2L7Br7DRnaqjd/jXkUA==
X-Google-Smtp-Source: AGHT+IH6MGh890RpJOAhWkIzjPezLzmOrRovhlN9+UMtKRR4rmBp2KdjDYgR8T0R927JexsBpzD7AvcLXd7upm0CC1s=
X-Received: by 2002:a17:90b:53cc:b0:2ee:8430:b831 with SMTP id 98e67ed59e1d1-306a47fe4a1mr1340105a91.2.1743711528812; Thu, 03 Apr 2025 13:18:48 -0700 (PDT)
MIME-Version: 1.0
References: <CAKx+b+YWjWFd61zSbCQjX2zzTfPVo3=8rXS24wQ_4WVhHo9FrA@mail.gmail.com>
In-Reply-To: <CAKx+b+YWjWFd61zSbCQjX2zzTfPVo3=8rXS24wQ_4WVhHo9FrA@mail.gmail.com>
From: Jonathan Lennox <jonathan.lennox42@gmail.com>
Date: Thu, 03 Apr 2025 16:18:37 -0400
X-Gm-Features: AQ5f1Jqf8YiDTawBuWy0kQr47QUtcCDc_DOTh8lzBKzyDSWRqs3U8j15F319z4E
Message-ID: <CAKx+b+Z5h4j4veKC5ws+PH5DsM58BGrK0PQuU=5jLDjkzF+zKg@mail.gmail.com>
To: sbm@ietf.org
Content-Type: multipart/alternative; boundary="000000000000440aa90631e57a73"
Message-ID-Hash: MDJS4F33OA6YWW3UERXANOEZVFHOLV3Q
X-Message-ID-Hash: MDJS4F33OA6YWW3UERXANOEZVFHOLV3Q
X-MailFrom: jonathan.lennox42@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [sbm] Re: Kernel API invariants and surprising behavior in TCP_NOTSENT_LOWAT
List-Id: Source Buffer Management <sbm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/sbm/p4A4p60_bbnwogYeZTXvI2jNETQ>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sbm>
List-Help: <mailto:sbm-request@ietf.org?subject=help>
List-Owner: <mailto:sbm-owner@ietf.org>
List-Post: <mailto:sbm@ietf.org>
List-Subscribe: <mailto:sbm-join@ietf.org>
List-Unsubscribe: <mailto:sbm-leave@ietf.org>
I discussed this with some people at the IETF, and it turns out I was wrong -- this is not actually an invariant that the Linux kernel assumes. In fact, Linux already has the property that epoll() and select() on a TCP socket don't return writable until the available space in the socket's send buffer is at least half the amount of currently queued data, even though (as far as I can tell from my experiments), a write to the socket will succeed as soon as any space at all is available (i.e. once data has been acknowledged). So, I think a macOS-semantics implementation of TCP_NOTSENT_LOWAT (or a desired-semantics implementation of TCP_REPLENISH_TIME) would not do the violence I feared to the Linux kernel's assumptions about sockets, and hopefully would be relatively straightforward to write. Instead, it would more-or-less be modifying the logic that decides the writability decision above. Though I still ask if there are any Linux kernel developers here, or if anyone knows any who can be recruited to help. On Wed, Mar 12, 2025 at 1:36 PM Jonathan Lennox <jonathan.lennox42@gmail.com> wrote: > One surprising thing about the original macOS semantics for > TCP_NOTSENT_LOWAT (and thus the intended semantics for TCP_REPLENISH_TIME) > is that it breaks an implied kernel invariant. I hypothesize that this is > why the Linux implementation behaves differently than the macOS one. > > This invariant is that the status returned by select() / kevent() / > epoll() is *true*: that if those system calls say that a socket isn't > writable, then it really isn't, and attempting to write to it will block, > or return EWOULDBLOCK / EAGAIN. > > The macOS semantics of TCP_NOTSENT_LOWAT instead break this - select() or > kevent() will claim that a socket is not writable when it has more than > TCP_NOTSENT_LOWAT bytes in its buffer, even though a write will in fact > succeed up to SO_SNDBUF bytes. Thus, once TCP_NOTSENT_LOWAT is set, there > is no way to tell on macOS if a write to the socket will actually succeed > or block. > > I believe this invariant is pretty baked in to the Linux kernel code, and > that's why the implementation made it so that it was in fact impossible to > write more than TCP_NOTSENT_LOWAT bytes to the socket. (I'm not an expert > in the Linux socket code, but it looks like it uses the same code to > trigger epoll and to decide when a tcp endmsg() call truncates its write.) > > I'm thinking that it may be cleaner from an API design perspective > (though, sadly, less portable) to instead add new kevent and epoll event > types specifically for when the socket hits its low-water point, whether by > size or by time, and leave the existing writeable events to their original > semantic. (This wouldn't allow triggering low-water from select(), but I > don't think that's a huge hardship for modern code.) > > Are there Linux kernel developers on this list? I suspect this will need > input from that community. >
- [sbm] Kernel API invariants and surprising behavi… Jonathan Lennox
- [sbm] Re: Kernel API invariants and surprising be… Jonathan Lennox