Re: [Bpf] Review of draft-thaler-bpf-isa-01

David Vernet <void@manifault.com> Mon, 14 August 2023 16:18 UTC

Return-Path: <dcvernet@gmail.com>
X-Original-To: bpf@ietfa.amsl.com
Delivered-To: bpf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03DA5C1519B9 for <bpf@ietfa.amsl.com>; Mon, 14 Aug 2023 09:18:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.407
X-Spam-Level:
X-Spam-Status: No, score=-1.407 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cONT9Qz5YU31 for <bpf@ietfa.amsl.com>; Mon, 14 Aug 2023 09:18:03 -0700 (PDT)
Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 59825C151997 for <bpf@ietf.org>; Mon, 14 Aug 2023 09:18:03 -0700 (PDT)
Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-64726360433so4178586d6.3 for <bpf@ietf.org>; Mon, 14 Aug 2023 09:18:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692029882; x=1692634682; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S6pghZcCVmSXG7aFJ1PmYcrKWfTMAcd8O51fDjdEqXY=; b=SzNcy4t6erFB+xMxG22UXzI70x9hHcn5S9fOy1jXokPdc+tgZNDcyg3qWwftKxGl/p l/r1ECpDlNDN8baIIJIXDTZjpzRUhbvvQsiexFzm0D7K3Z5/wqrJwE5ip7eoJnQmHKz3 SiakrY9rt7XUFT80ZrjWRDgwJc0r7CyZ7D63OK+OfbbT4Qqsgn1tLFYE3jKX0/WghUYL 05/oOj1OMpfMs75X52ircACQR4qRqzDFtP/JUIL0kD2FmtSXMVUl4EHEna9PKyREZ1oZ rrOTobKuw2kzPLZ6IJ/TVAxCFL6PzhfyCRKhfN9fvfD1Eiyy0HDa0N8F3jgvge/dCxPb khXA==
X-Gm-Message-State: AOJu0YzFVsJKT5lIP8+DPJWsramGNYy4sl1hR/vsd+o00TPKRESN120w lxxzF4v/c57QzQdHl231B1E=
X-Google-Smtp-Source: AGHT+IEBm0gaVL+UiqRDA2jxrKxIJgbxCGHDXvwhJcVLqJGJkPjzn/riy+kyyz70ZvQwcD47zJUKSA==
X-Received: by 2002:a0c:e1d3:0:b0:641:8cb7:b089 with SMTP id v19-20020a0ce1d3000000b006418cb7b089mr10995856qvl.27.1692029882104; Mon, 14 Aug 2023 09:18:02 -0700 (PDT)
Received: from maniforge ([2620:10d:c091:400::5:93a1]) by smtp.gmail.com with ESMTPSA id g4-20020a0cf084000000b006301d31e315sm3479721qvk.10.2023.08.14.09.18.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Aug 2023 09:18:01 -0700 (PDT)
Date: Mon, 14 Aug 2023 11:17:59 -0500
From: David Vernet <void@manifault.com>
To: Watson Ladd <watsonbladd@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>, Dave Thaler <dthaler@microsoft.com>, Christoph Hellwig <hch@infradead.org>, "bpf@ietf.org" <bpf@ietf.org>, bpf <bpf@vger.kernel.org>
Message-ID: <20230814161759.GF542801@maniforge>
References: <CACsn0ckZO+b5bRgMZhOvx+Jn-sa0g8cBD+ug1CJEdtYxSm_hgA@mail.gmail.com> <PH7PR21MB3878D8DCEF24A5F8E52BA59DA303A@PH7PR21MB3878.namprd21.prod.outlook.com> <CAADnVQJ1fKXcsTXdCijwQzf0OVF0md-ATN5RbB3g10geyofNzA@mail.gmail.com> <CACsn0cmf22zEN9AduiRiFnQ7XhY1ABRL=SwAwmmFgxJvVZAOsg@mail.gmail.com> <CAADnVQ+O0CZQ1-5+dBiPWgZig3MVRX92PWPwNCrL7rG+4Xrbag@mail.gmail.com> <CACsn0cmvuGBKd3erDQKugygZfhT-Cu8xYBJ3hCETp6a-1HNbYw@mail.gmail.com> <20230811172116.GC542801@maniforge> <CACsn0cmbDGpj8R98=DF00-hhjAKph+kHofAs3LF=KKonFYZeuA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CACsn0cmbDGpj8R98=DF00-hhjAKph+kHofAs3LF=KKonFYZeuA@mail.gmail.com>
User-Agent: Mutt/2.2.10 (2023-03-25)
Archived-At: <https://mailarchive.ietf.org/arch/msg/bpf/eqdpBRdQFYPaNY6rg-jPu8KTidc>
Subject: Re: [Bpf] Review of draft-thaler-bpf-isa-01
X-BeenThere: bpf@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Discussion of BPF/eBPF standardization efforts within the IETF <bpf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bpf>, <mailto:bpf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bpf/>
List-Post: <mailto:bpf@ietf.org>
List-Help: <mailto:bpf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bpf>, <mailto:bpf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Aug 2023 16:18:08 -0000

On Fri, Aug 11, 2023 at 02:36:04PM -0700, Watson Ladd wrote:
> Dear David,

Hi Watson,

> Thank you very much for your lengthy and kind email. I agree that we

And thank you for offering your time and expertise as a reviewer of the
document.

> should punt on contentious points and aim to standardize what has
> already been implemented across a wide range of implementations. Most
> of my issues are with the format and presentation of the text, and I
> think the content changes it would take are pretty noncontenous. I

Ack, sounds good. Just FYI, I realize (and appreciate) that you are
fulfilling your role as a reviewer, as you offered to do at IETF117.
That said, even just as a reviewer, it can often be easier to discuss
proposed changes if there are accompanying patches which illustrate your
intent. For example, moving the Maps section further up the document,
etc. This is not strictly necessary, especially if you're just reviewing
the document, but I do think it would help to ground some of the
discussions. For what it's worth, this is more the "Linux kernel way",
but I'm not sure if this is the norm for IETF.

> don't really have any insight into what the content should be, and I'm
> sure that for those who live and breath BPF every day, much of what I
> am confused about is indeed obvious.

There is a difference between something being obvious and being well
specified, so your review is appreciated either way.

> Concretely I think the following would help improve the
> understandability of the document:
> * After the register paragraph, describe the memory. As I understand

Just FYI, we are going to be moving the register paragraph to a separate
ABI document, as it really belongs in an ABI (Informational) document
than a Proposed Standard for the ISA.

> it it is a 64 bit, byte addressed, flat space, and maps are just
> special regions in it. Maybe I'm wrong. There's some stuff about types
> in the big space of instructions that maybe makes me think I am wrong.

I think it's OK to specify that it's a 64 bit, byte addressed flat
space, so folks can feel free to submit a patch to that effect. My only
worry is that it may set a precedent that we'll have to explain a lot of
other architectural details in the ISA standard, where they don't really
belong. Take a look at how ARM [0] does this, for example. There is an
entirely separate document [1] on the AArch64 memory model, and for good
reason. It describes the hierarchical model of page tables on ARM, what
the memory model is when there's no MMU, how device memory works, etc.

[0]: https://www.arm.com/architecture/learn-the-architecture/a-profile
[1]: https://developer.arm.com/documentation/102376/latest/

So while I don't disagree that it should be non-controversial and would
help clarify the programming environment for the instructions, I want to
make sure that folks are OK with us adding this, but not necessarily
expounding on all of the details and implications. Hopefully this should
be fine, as it's been the norm for other ISAs (e.g. RISC-V) as well.

FWIW, I think this applies to maps as well. It's definitely something we
need to expand on at some point, but the possibility for scope creep is
high.

This is what's there now:

> =========================  ======  ===  =========================================  ===========  ==============
> opcode construction        opcode  src  pseudocode                                 imm type     dst type
> =========================  ======  ===  =========================================  ===========  ==============
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x0  dst = imm64                                integer      integer
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x1  dst = map_by_fd(imm)                       map fd       map
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x2  dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x4  dst = code_addr(imm)                       integer      code pointer
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x5  dst = map_by_idx(imm)                      map index    map
> BPF_IMM | BPF_DW | BPF_LD  0x18    0x6  dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
> =========================  ======  ===  =========================================  ===========  ==============
>
> where
>
> * map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
> * map_by_idx(imm) means to convert a 32-bit index into an address of a map
> * map_val(map) gets the address of the first value in a given map
> * var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
> * code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
> * the 'imm type' can be used by disassemblers for display
> * the 'dst type' can be used for verification and JIT compilation purposes
>
> Maps
> ~~~~
>
> Maps are shared memory regions accessible by eBPF programs on some
> platforms. A map can have various semantics as defined in a separate
> document, and may or may not have a single contiguous memory region,
> but the 'map_val(map)' is currently only defined for maps that do have
> a single contiguous memory region.

There are a lot of caveats here about the semantics of maps, whether
they do or don't have a contiguous memory region, etc. There likely
isn't a single definition of a "map memory region" that will apply to
all maps. For example, there is a BPF_MAP_TYPE_PERCPU_ARRAY map, which
in contrast with BPF_MAP_TYPE_ARRAY, need not be a contiguous memory
region and could instead leverage however percpu memory is implemented
on the platform. In my personal opinion, what Dave wrote here is rather
ideal in that it's giving the reader some context on what maps are,
without over prescribing them. With that in mind, are you ok with
leaving the maps section as is (modulo moving it as you suggested
below), with the intention being that we'll fill in the details on the
various types of cross-platform maps when we write the proposed standard
on cross-platform map types?

> * Say this is a 2's complement architecture

Hmmm, so, I realize that others have suggested this as well (e.g. Will
pointed out in [2] that he agrees that we should include this), but to
be honest I'm still not quite following why this is something that folks
feel should be included.

[2]: https://lore.kernel.org/all/CADx9qWiJCQyLdz5rG33K2iWtsgXQ65K3aiwQiEsjSwY2ofNy1Q@mail.gmail.com/

The RISC-V standard does indeed specify that signed types are
represented with two's complement, but as far as I know that is the
exception rather than the rule. As Jose explained in [3], it's more
common for ISA specifications to specify how numerical immediates are
encoded in stored instructions rather than dictating how signedness is
represented across the entire architecture.

[3]: https://lore.kernel.org/bpf/871qhe7des.fsf@gnu.org

The ARM64 ISA [4] aligns with what Jose said as well. The Intel and AMD
x86_64 ISAs specify that certain instructions perform two's-complement
negation, but as far as I know they don't dictate the type semantics for
the entire architecture.

[4]: https://developer.arm.com/documentation/ddi0602/latest

The reason that I think it may be prudent to leave this off is that I
don't think we really gain anything by specifying it. I expect that
we'll include this in the psABI document, but it seems unnecessarily
constraining to dictate two's complement in the PS _ISA document_. At
the end of the day I expect our goal will be source-code compatibility
rather than binary compatibility, and while it's entirely possible that
we'll never see it, I think it would be prudent to give platforms the
flexibility to implement signedness in other ways if they have a reason
to do so.

Case in point, while it's not an ISA standard, section 6.2.6
("Representations of types") of the C standard specifies the following:

> 1 The representations of all types are unspecified except as stated in this subclause.
> 2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
> order, and encoding of which are either explicitly specified or implementation-defined.
> ...
> For signed integer types, the bits of the object representation shall be divided into three groups:
> value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall
> not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have
> the same value as the same bit in the object representation of the corresponding unsigned type (if
> there are M value bits in the signed type and N in the unsigned type, then M <= N). If the sign bit is
> zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in
> one of the following ways:
>
> — the corresponding value with sign bit 0 is negated (sign and magnitude);
> — the sign bit has the value −(2M) (two's complement);
> — the sign bit has the value −(2M − 1) (ones' complement).

In my opinion, the intention of the standard to not over-specify should
apply here as well.

> * I finally understand why the code fields have their low nybble zero.
> We should maybe say this.

Hmmm, sorry, I'm not quite following this part. Could you please clarify
which encoding / section you're referring to specifically?

> * Explicitly call out after 5.2 that there is no memory model yet

I'm not fundamentally opposed to this, though I'll share my subjective
opinion that specifying the absence of something seems unnecessary for a
standard. It seems like it would fit better in an Informational
document? Not sure.

> * Pull up section 5.3.1 to the top, or figure out some way to punt it
> to an extension. Maybe introduce maps up top then explain how they are
> indexed here.

I'm fine with this be reworked if you think it would clarify things.
Please feel free to submit patches that reformats in a way that you
think is clearer. Or Will, or someone else, etc :-)

> For extensions if I think I understand the conversation at IETF 117,
> it's easy to add more calls to the host system as functions. It's a

I'll give a bit of background here on the Linux side. It's easy to add
what we call kfuncs [5], which are host-system / main-kernel functions
that can be called from BPF. These have no UAPI guarantees, so we can
safely add them without having to worry about supporting them forever /
indefinitely. In contrast, we no longer add [6] helper functions because
they do come with UAPI requirements, and thus with much more significant
potential long-term overhead.

[5]: https://www.kernel.org/doc/html/latest/bpf/kfuncs.html
[6]: https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

More on this below.

> lot more of a difficulty to add more instructions, but in the wide
> encoding space there is room. We could definitely say that. The memory

To the point above about helpers vs. kfuncs, I think there's a
nontrivial amount of subtlety we'll have to capture. And at the risk of
being a broken record, I don't think we necessarily want to try and
capture all of the nuances of the difficulty in adding instructions,
certain types of host-system functions, etc, in the ISA document. What I
feel does belong in the ISA document would be a very clear prescription
for how the ISA can be _extended_. The difficulty in extending the ISA
should hopefully be self-evident from that.

In contrast, the [I] architecture and framework document seems like a
more appropriate place to go into details on platform-specific
host-system functions vs. helpers, etc. We also have plans to implement
another proposed standard on the cross-platform helper functions:

* [PS] cross-platform helper functions, e.g., for manipulation of maps,

and I'm sure we'll also describe how that PS can be extended in that
document.

Please let me know if I've misunderstood your suggestion.

> model should only modify the behavior of environments with races, so
> if things aren't racy, nothing changes. That should work, but maybe I

Just to be clear, I think you mean the memory _consistency_ model,
correct? But yes, if you're running in e.g. a uniprocessor environment,
we shouldn't have to worry about any of that. As I said in the prior
email, I definitely agree with you that we'll need to formally define
all of this in a subsequent extension of the document, but think that
we'll be OK with sidestepping it for now. It sounds like we're on the
same page for this one?

> don't understand what other extensions that people would want to add.
> Verification might be an extension, but probably not in the sense we
> need to worry about it here?

Hmmm, I don't expect that verification would ever be relevant to
extending the ISA. There may be some platforms that don't implement
verification at all. Verification will likely be relegated to:

* [I] verifier expectations and building blocks for allowing safe
  execution of untrusted BPF programs,

from our charter, and will likely also be mentioned in the architecture
and framework informational doc.

I expect that future extensions would be something like adding a fence
instruction, etc were we to decide to go that route.

> I hope the above is helpful: as always my ignorance can completely
> negate the value of the concrete suggestion, but I do hope it
> highlights areas that could use some TLC.

Your input is very much appreciated, and I don't disagree with you that
the doc could (as with any document) always be further improved.

Just as a friendly suggestion, as I said above, if you feel strongly
about these proposals I expect that it would be easier to incorporate
them by submitting patches with the proposed changes rather than simply
describing the perceived gaps. That's not a hard requirement by any
means, and I appreciate you volunteering your time to review the
document regardless. Just a suggestion for greasing the wheels.

Kind regards,
David