Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol

Tom Herbert <tom@herbertland.com> Mon, 23 March 2020 15:39 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8EF5B3A096B for <int-area@ietfa.amsl.com>; Mon, 23 Mar 2020 08:39:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zocckK_XexYi for <int-area@ietfa.amsl.com>; Mon, 23 Mar 2020 08:39:26 -0700 (PDT)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4645E3A0962 for <int-area@ietf.org>; Mon, 23 Mar 2020 08:39:13 -0700 (PDT)
Received: by mail-ed1-x52e.google.com with SMTP id z65so16753944ede.0 for <int-area@ietf.org>; Mon, 23 Mar 2020 08:39:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=aOchIgNI08Insrvyhp1aNlMCpwlcrFJsdo2XB0rdZoo=; b=nFdbtz4EFIvF9C69RJGy0aKllGXhn7WKHpHM+XKs5b90fykcptdSEv6c4VVTvMmDwx PFeC9v++V62xev/GNQAZGXgFvYieNsOY9UthNCWO15i0XmGirrF+kZ29j7cHrZ/KN9vE l1YfhJAAtmopVGV9yt/QTS6VVYaULhiBqxg81Hz6alP0SHw07g3h8vM8VGYfCBZK3Y/A x/5L0J/41sLUHWMgkHLGb/XYoGCWkPRtjfvbZzRpQt3odWEOFPAuxeKXU1vxLtJ/ftWQ lbYU1l2B2cUCQwsQ9YHCpbg0M9cRogcEy30BqcWWAmgLcbTkZwooil2DVULlRQYhKQhI uJnQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=aOchIgNI08Insrvyhp1aNlMCpwlcrFJsdo2XB0rdZoo=; b=WFvMzxCCuVD6wyBKhxEzaVt2SIJoXoOFGbeDD9DuDE4yS28fsXBQaV/Alr+opWG9/i bV8jUyOia4hLn+htU2oTXY0/gM3JtDhwKEauchF6wfAtbTq0q5S1Pe2Mhhr6NntoCXg5 U+w/3QqnqAlWgDp73mOxadncdR6yP18rjGc5F7Er108t7dlP3hspOeRTrV/9Sz5+jSBF s+3z7SGxFFl3tWU9df2DZR7JVNtI1Je2lh2rVVJthf0fIo/RcKt7wpRJWpl9dlvG9zHE KtyKuI7T1IK/0XCr8ZzRMM59vMWkOi8RSErh7T2BivN1DTOhABnoWg6BTXIYP65vpmmB A+Jw==
X-Gm-Message-State: ANhLgQ2P3o+am5hQRzyo+qXDmE6cNFft+PFrs3Y3C3vgCT1hW8jsoco4 l58aXFC3qbPUPanVHGBN1412xsKUzDqp/hrHwGrl9BlO
X-Google-Smtp-Source: =?utf-8?q?ADFU+vsegCe/rcEpPVoRaDnMbR4K8KqyA0Qni6GUChsh?= =?utf-8?q?d5F3GcxFD2uYA8XsZdkLL7cXTKROX+m+C2239Cuz+e476hg=3D?=
X-Received: by 2002:a17:906:4a03:: with SMTP id w3mr2328026eju.245.1584977946460; Mon, 23 Mar 2020 08:39:06 -0700 (PDT)
MIME-Version: 1.0
References: <DC440B28-DA08-499F-8A2A-7A8ACF880724@kaloom.com> <A6B82786-FB50-4AAA-8D69-0A55FEB5DC3B@strayalpha.com> <4bad2d30-0220-a836-451d-b01fdba4d098@redhat.com> <0C774D74-89A9-44CB-BCE7-A0ACC138C10F@strayalpha.com> <4cd43b9b-f7fa-0fc5-3ba9-11a735268288@redhat.com> <BAAD573B-497C-4F86-AF7A-776781698717@strayalpha.com> <eb054946-0bbe-ce6b-3a7d-6e2630ae4c6f@redhat.com> <E206BEE8-C157-4733-924F-649C94321E03@strayalpha.com> <ab1de07e-6284-5fe6-ef0d-46303f996354@redhat.com>
In-Reply-To: <ab1de07e-6284-5fe6-ef0d-46303f996354@redhat.com>
From: Tom Herbert <tom@herbertland.com>
Date: Mon, 23 Mar 2020 08:38:54 -0700
Message-ID: <CALx6S35zKeQGOzDHNM5iHbdO2Mxykse++khP74V=9z78nzRi2w@mail.gmail.com>
To: Jon Maloy <jmaloy@redhat.com>
Cc: Joseph Touch <touch@strayalpha.com>, int-area <int-area@ietf.org>, Suresh Krishnan <suresh@kaloom.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/o93UlMHZzzOuF18AYf4ez3iR7-Q>
Subject: Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Mar 2020 15:39:32 -0000

On Mon, Mar 23, 2020 at 7:32 AM Jon Maloy <jmaloy@redhat.com> wrote:
>
>
>
> On 3/20/20 11:04 AM, Joseph Touch wrote:
>
>
>
> On Mar 20, 2020, at 7:09 AM, Jon Maloy <jmaloy@redhat.com> wrote:
>
> Adding cc to int-area@ietf.org, since I forgot that in my original response.
>
>
> On 3/19/20 9:18 PM, Joseph Touch wrote:
>
>
>
> On Mar 19, 2020, at 4:46 PM, Jon Maloy <jmaloy@redhat.com> wrote:
>
> IP addresses are no good in the *user API*, because they are location bound.
> That is also why DNS was invented, I  believe.
>
>
> DNS names are intended to be a human-rememberable alias to an IP address. They do not indicate a location any more than an IP address does or does not.
>
> Exactly. Read what I wrote again.
>
>
> IP addresses are no good in the USER API because they are location bound.
> False. DNS names are provided as an alternative for the user API because they are easier for people to remember and type.
>
> Then I should probably rephrase this so saying that "IP addresses AND DNS names and are no good in the user API...", although I don't quite agree with that. DNS names are of course much more convenient for a user to deal with than IP addresses.
>
>
> Type in www.google.com
>
> Now type in its IPv6 address.
>
> Now see if you remember google’s website DNS or its IPv6 address. That’s what the DNS was originally intended for.
>
> Yes. But in this case also demonstrates that both DNS names and the IP address may be location independent. We have no clue whether a call will end up in a server farm in the US or Europe, let alone which server it will be handled on. So, even though the original purpose of DNS may have been something else, it has clearly followed the obvious path of becoming a tool for location independence. This is good, but not good enough for our purposes.
>
>
> DNS names are no more or less location-independent than IP addresses.
>
> This is also why DNS was invented...
>
> False. The reason the DNS exists has nothing to do with location. It’s simply string substitution for convenience, or at least was ONLY that originally.
>
>
> I think you just supported my case for a location independent addressing scheme.
>
>
> I am - but then I’m baffled why you want to run direct over IP. Ethernet has location independent addresses; IP does not* (see next part).
>
>
> When I am talking about location independence I am always talking about what the socket programmer/user sees. We don't want him to handle IP addresses, and we probably don't want him to hard code DNS names either.
>
> But, at some level further down in the stack we never get around translating location independent addresses to some form of location dependent ditto in order to transmit the packets to the right node and socket. Be it MAC, IPc4, IPv6 or anything else.
>
> This is what we do in TIPC :
>
> Socket Layer:            {service type, service instance}                 {port number}
> ------------------                                  |                                                          A
>                                                        v                                                          |
> TIPC Binding Table:  {port number, node number}                                   |
> -------------------------                          |                                                          |
>                                                        v                                                          |
> TIPC Link Layer:            {UDP port, IP address}                       {UDP port, IP address}
> -----------------------             or {MAC address}                                or {MAC address}
>                                                        |                                                          A
>                                                        v                                                          |
>                                                        +--------------------------------------------->+
>
>
> The {UDP port, IP address} tuple (or MAC address) at the link layer are never visible to the user, and may change on-the-fly without him ever noticing.
> The same is true for the {port number, node number} tuple, although the user here has the option to use those directly, at the expense of location transparency.
> So, our request is simply about enabling us to use a third mapping at the link layer, an IP address only. This does not in any way interfere with the location transparency that is already provided at the socket level.
>
>
> This was one of the original motivations for developing TIPC in the first place.  A programmer using TIPC can hard code his service addresses if he wants to, ignoring the number of or location of the corresponding endpoints, even as those move around or scale up/down quite fast.
>
>
> Anycast gives you location independent addresses at the cost of doing discovery “inside the network layer”.
>
>
> Yes, and that is what we do. But for this to be of any use, that discovery/translation has to be blistering fast, and that is also what we do.
>
>
> However, even if you have those addresses, you still need to identify the service types (which is what we use ports for).
>
>
> UDP (at the link level) has only one service type in this case: "TIPC"
> At the socket level we are using TIPC service addresses for this, i.e., a {service type, service instance} tuple, each element being a 32-bit integer.
>
>
> ——
>
> I’m still stuck at why you want to run direct over IP. If you want Ethernet that bridges across routers, GRE does that.
>
>
> Yes, we could use VxLAN or Geneve or whatever. But that always comes to a cost both in performance and maintenance.
> We want TIPC to be both performant and really simple to use.
>
> If you want loc-independent addresses for services, UDP over IP using anycast does that.
>
>
> Again yes, but IP is normally not location independent inside clusters. 8.8.8.8 may be perceived as location independent, but 192.168.100.17 is typically not. And UDP has well-known limitations:
>
> 1) - UDP has 16-bit port numbers, a number space which has to be strictly managed.
>     - TIPC has a 32-bit+32-bit service address instead. This is what we want
>       to extend to 128+128 bits, so that nobody ever needs to register a
>       well-known address for TIPC. At least not for the purpose of
>       avoiding collisions.
> 2) - UDP is best effort.
>     - Standard TIPC anycast is "better than best" effort, because packets will
>       never be lost in transport. Due to lack of socket level flow control, there
>       is still a risk of seeing messages being dropped, though.
>     - Group anycast DOES have end-to-end flow control, so such messages
>       will never be lost or disordered.
> 3) Furthermore, we have reliable multicast and broadcast using the same
>     address type. There is no way you can get that with UDP.
>
>
> What is the specific gain of needing IP but not allowing a transport? AFAICT, it’s all down to GSO - which is an implementation. If GSO doesn’t do what you want, it would be useful to take your issues there or edit the code yourself and submit the patches.
>
>
> In that respect this is only an implementation issue, as you say, but it is not a TIPC only one.
> The slides referred to me by Tom Herbert describe GSO on large UDP messages, but they don´t describe how we go one step further and do it on the inner messages, or how we identify those as being TIPC in the first place. Furthermore, we would have to re-write the host level GSO support, which am highly uncertain that the Linux network community would accept, given that everything needed already is there (i.e., if we only have a proper protocol number.)

I don't understand why you think you need to rewrite GSO, there has
been an enormous amount of work to make this usable and extensible. I
suggest you take this up on the netdev list since this is about
implementation. I'd also point out that having a separate protocol
number is hardly a guarantee of acceptance in Linux, we would still be
asking for a justification and why wasn't this done in UDP.

>
> GSO is only one of the reasons for our request. There are more reasons:
> - Performance. The difference is not dramatic, but clearly measurable.
>   Terminating sockets in kernel space comes at a cost.

And what exactly is the performance difference that do your measurements show?

> - The need to be able to register a new socket type, which will map down
>   to a (compatible) TIPC v3 protocol.

A new socket type does not require a new protocol number. There are
many examples of that. AF_KCM for instance.

> - Acceptance. We want to have TIPC recognized as a part of the IP protocol
>   family, controlled by IETF, like most other protocols.

Well "most other protocols" nowadays are being defined over UDP-- e.g.
QUIC, all the various encapsulation protocols. The reasons for this
are: 1) there's only 256 IP protocol number, but 65536 port numbers,
hence it's obviously going to be easier to get a port number
assignment as opposed to a protocol number. 2) Network devices
notoriously don't handle new protocols well. If a protocol number is
assigned for TIPC and a packet is sent with the number, somewhere and
sometime an intermediate device will drop the packet. 3) UDP is really
cheap wire overhead (eight bytes) and we've put a lot of effort into
optimizing it in implementation at least in Linux (like all the
aforementioned GSO/GRO work).

Tom

>
>
> Regards
> ///jon
>
>
> Joe
>
>
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area