Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol

Jon Maloy <jmaloy@redhat.com> Fri, 20 March 2020 00:21 UTC

Return-Path: <jmaloy@redhat.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5ED613A1307 for <int-area@ietfa.amsl.com>; Thu, 19 Mar 2020 17:21:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.088
X-Spam-Level:
X-Spam-Status: No, score=-2.088 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id myjsfDMFEg4o for <int-area@ietfa.amsl.com>; Thu, 19 Mar 2020 17:21:13 -0700 (PDT)
Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [216.205.24.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 686063A12FA for <int-area@ietf.org>; Thu, 19 Mar 2020 17:20:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1584663632; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WksqmsHJs/M0Pa9zcY4ebND4njdv5pzzISeHznvmxrI=; b=HQ4dtU2oMayzKI0v4HwM3zlUKDuRdCLy0AtfjrmyPtc5aZRJXdzkRka/X8dYEw/+Qc/MQf 3//CCH34S1zGrVw8WbJyx2RfJoa+x2SwpJcXIL5xCk239Zq3wwwCJq6pHp36w6YCwHxEmV PUuXr5ZxCxO+VdsG2O7KOdV1FFpOHcI=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-401-WvCHBxgnMvSNdBLT50SKdw-1; Thu, 19 Mar 2020 20:20:13 -0400
X-MC-Unique: WvCHBxgnMvSNdBLT50SKdw-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 928CA800D53; Fri, 20 Mar 2020 00:20:11 +0000 (UTC)
Received: from [10.10.112.60] (ovpn-112-60.rdu2.redhat.com [10.10.112.60]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2DDE310016EB; Fri, 20 Mar 2020 00:20:09 +0000 (UTC)
To: Tom Herbert <tom@herbertland.com>
Cc: Joseph Touch <touch@strayalpha.com>, Suresh Krishnan <suresh@kaloom.com>, int-area <int-area@ietf.org>
References: <DC440B28-DA08-499F-8A2A-7A8ACF880724@kaloom.com> <A6B82786-FB50-4AAA-8D69-0A55FEB5DC3B@strayalpha.com> <4bad2d30-0220-a836-451d-b01fdba4d098@redhat.com> <CALx6S374FRTzZ2FgAw9fpkwt0U2ykvLtdSFzi+y1wapAPQ8sSQ@mail.gmail.com> <3a04572b-381a-b023-e413-1e369aecc713@redhat.com> <CALx6S36uWAXJV9uhJRo+oUKCa6F+06DMc9=rrnP0yD1aGe9UJA@mail.gmail.com>
From: Jon Maloy <jmaloy@redhat.com>
Message-ID: <626ca849-738c-c404-a686-1ab5d5f9bf58@redhat.com>
Date: Thu, 19 Mar 2020 20:20:09 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0
MIME-Version: 1.0
In-Reply-To: <CALx6S36uWAXJV9uhJRo+oUKCa6F+06DMc9=rrnP0yD1aGe9UJA@mail.gmail.com>
Content-Language: en-US
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/_53TjLB1lR6mN8djLeslfFsUQy8>
Subject: Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Mar 2020 00:21:29 -0000


On 3/19/20 8:07 PM, Tom Herbert wrote:
> On Thu, Mar 19, 2020 at 4:33 PM Jon Maloy <jmaloy@redhat.com> wrote:
>>
>>
>> On 3/19/20 7:03 PM, Tom Herbert wrote:
>>> On Thu, Mar 19, 2020 at 3:43 PM Jon Maloy <jmaloy@redhat.com> wrote:
>>>>
>>>> On 3/18/20 12:04 AM, Joseph Touch wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I’m quite confused by this request.
>>>>
>>>> It seems like they either have an implementation issue (in Linux).
>>>>
>>>> Linux "passthru" GSO is implemented so that any IP based protocol which wants to benefit
>>>> from it needs its own IP protocol number. Doing this generically through the already existing
>>>> UDP protocol number is not possible, because GSO on a host must be implemented
>>>> specifically (e.g., regarding segmentation) per carried protocol. That is just a fact, and not
>>>> an implementation issue.
>>> Jon,
>>>
>>> I'm not sure I understand your point. Linux already supports GSO, and
>>> GRO for that matter, for several protocols encapsulated over UDP. I
>>> don't see any requirement for a protocol to need its own IP protocol
>>> number in this regard.
>>>
>>> Tom
>> Yes, but this is not about guest GSO. What we need is something more
>> similar to TCP TSO, where we can send full-size buffers down to the
>> host OS, and only do segmentation (or in our case, a TIPC specific
>> fragmentation where each fragment gets an individually numbered header)
>> when we find that the destination is off-host.
>> Basically we want to transport full-size messages between VMs when those
>> are located in the same host. So far, I haven´t found any way to
>> do this on the host by looking at the inner protocol carried over UDP.
>> But I may of course be wrong at this point, I know you are the expert.
>>
> Jon,
>
> You might want to look at Willem's work in UDP GSO
> (http://vger.kernel.org/lpc_net2018_talks/willemdebruijn-lpc2018-udpgso-presentation-20181104.pdf).
> That might be useful as a generic method assuming the proper APIs are
> supported (this is exactly how QUIC GSO was solved without needing
> explict kernel support for QUIC).
>
> Tom
Hi Tom,
I´ll take a look at this. Thank you for the tip.
///jon
>
>> ///jon
>>
>>>> I checked their documentation, which includes smoothing that looks a little like an Internet Draft:
>>>> http://tipc.io/protocol.html
>>>> but it’s quite confusing. Taken at face value, they make their own argument that IP addresses won’t work - at which point running raw over IP serves no utility (sec 3.1.1),
>>>>
>>>> That is not a correct interpretation of the text. There is nowhere stated that IP addresses won't work for TIPC,
>>>> neither in sec. 3.1.1 or anywhere else. Of course they work, *for transport purposes*, just like they have been
>>>> doing for many years already when running TIPC over UDP. What we state elsewhere in the document is that
>>>> IP addresses are no good in the *user API*, because they are location bound.
>>>> That is also why DNS was invented, I  believe.
>>>>
>>>> We also state that using IP addresses is less optimal than omitting the IP layer altogether
>>>> and using MAC addresses, but that doesn't mean the former are useless, -it just makes
>>>> IP the only viable alternative in the cases when a network owner doesn't allow non-IP
>>>> protocols though their back planes, or when routing gets involved.
>>>>
>>>> even though most of those claims are debatable (DNS-SD is too static? And expensive?? How so?). Then they reinvent the DNS in Section 6.
>>>>
>>>> There is no doubt that DNS is not the best choice for the type of environments (tight clusters) where
>>>> we use TIPC. All DNS implementations I know run in user land, and doing a service discovery typically
>>>> means at least one, and often several inter-process and potentially inter-node hops. Even if there is
>>>> a process local lookup cache in each sender, that cache has to be populated before it is of any use.
>>>> Instead, TIPC uses a tailor-made kernel resident translation service which normally contains a complete
>>>> copy of the the lookup database, so there are no unnecessary hops and no cache misses.
>>>>
>>>> This would have been of less importance if TIPC were only a connection oriented TCP-like service where
>>>> service lookup is only needed at connection setup. But a just as important feature of TIPC is its reliable
>>>> connectionless transport mode. Here, the lookup service is not primarily about service discovery
>>>> (although that is also important), but about efficient on-the-fly translation between user level service
>>>> addresses (aka "port names") and location bound socket addresses (aka "port identities"). This
>>>> translation has to be performed per message, not per connection, since the destination may change
>>>> between each message.
>>>>
>>>> If we were to make an analogy with the IP world, we could imagine that we use UDP to send high
>>>> volume traffic to many different destinations, each having its own domain name. Making a
>>>> separate DNS lookup for each sent message would certainly work, but it would not by far be as
>>>> performant as having a tailor made "always cache resident" translation table, shared between
>>>> all processes, like we do in TIPC.
>>>>
>>>> Furthermore, when the connectionless service is used, sockets might be created/deleted and
>>>> bound/unbound at extremely high rates, much higher than DNS with its hierarchical updates
>>>> is meant to deal with. This is what we mean with DNS being too "static". It is not saying that
>>>> DNS is bad, it is just stating that it is not designed for the very high performance requirements
>>>> and dynamism we have in TIPC.
>>>>
>>>> There is no doubt that a few things in TIPC could have been done differently,  but the decision
>>>> to design our own topology/lookup service is not among those. This request is an attempt to
>>>> open up for moving beyond some current limitations, e.g., by enabling introduction of a more
>>>> versatile 128-bit  service addressing concept.  Along with this request we are aiming at having
>>>> an updated version of the protocol description adopted as an informational RFC, so that
>>>> TIPC can be regarded as an IETF supported protocol in its own right.
>>>>
>>>> Whatever the viewpoints, TIPC is currently what it is, and rather than focusing on the motivation
>>>> for certain implementation choices and how they work, I think IETF should consider the fact
>>>> that this is a well-established service used by dozens of small and big companies, running high-volume
>>>> traffic at hundreds of telco sites around the globe. They should also consider that TIPC has
>>>> existed as a stable and well-maintained implementation in all major Linux distros for many years.
>>>>
>>>> IETF now has a genuine chance to help us making TIPC even more useful for existing and new users.
>>>>
>>>> BR
>>>> Jon Maloy
>>>>
>>>>
>>>> Frankly, IMO this would probably have a difficult time arguing for a transport protocol port number, much less an IP protocol number.
>>>>
>>>> Joe
>>>>
>>>>
>>>> On Mar 17, 2020, at 3:34 PM, Suresh Krishnan <suresh@kaloom.com> wrote:
>>>>
>>>> Hi all,
>>>>     IANA received an IP protocol number allocation request from Jon Maloy <jmaloy@redhat.com> for the Transparent Inter Process Communication (TIPC) protocol. I picked up this request as Internet AD as the registration procedure requires IESG Approval. I had provided the information below to the IESG and discussed this with a favorable view of this request. I am recommending allocation of an IP protocol number for this. If you have any concerns that you think I might have overlooked, please let me know by end of day March 24 2020.
>>>>
>>>> After several round trips of back and forth probing I had collected the following information regarding the protocol number request for TIPC. There were two main questions I had for him:
>>>>
>>>> * Q1: Why did they want an IP protocol number?
>>>> * Q2: Is the protocol implemented and deployed widely?
>>>>
>>>> Q1: Why did they want an IP protocol number?
>>>> ====================================
>>>>
>>>> There are two main reasons why they want to reserve an IP protocol number:
>>>>
>>>> 1)  Performance
>>>> They are currently working on adding GSO support to TIPC, including a TSO-like "full-size buffer pass-thru" though virtio and the host OS tap interface. They have experimentally implemented GSO across UDP tunnels, but performance is not good because of the way the tunnel GSO is implemented, and there is no 'pass-thru' support for this in Linux. They have even done the same at the pure L2 level, but L2 transport is sometimes not accepted by the cloud maintainers or the telco operators, and hence they need an alternative. The best alternative, both from a performance and acceptability viewpoint would be to establish TIPC as a full-fledged IP protocol, apart from the traditional L2 bearer many users are still using.
>>>>
>>>> 2) Currently TIPC has two user address types:
>>>>
>>>> struct tipc_service_addr{
>>>>       uint32_t type;
>>>>       uint32_t instance;
>>>>       uint32_t node;
>>>> };
>>>> struct tipc_service_addr{
>>>>       uint32_t port;
>>>>       uint32_t node;
>>>> };
>>>>
>>>> They want to complement this  with a new API where we have a unified address type:
>>>> struct tipc_addr{
>>>>      u8 type[16];
>>>>      u8 instance[16];
>>>>      u8 node[16];
>>>> };
>>>>
>>>> This would give a 128-bit value range for both 'type', 'instance' and 'node', and opens up for new opportunities:
>>>> - Users will never need to coordinate 'type' values since there will no risk of collisions.
>>>> - Users can put whatever they want into the fields, e.g., an IPv6 address, a Kubernetes or Docker container id, a LUKS disk UUID or just a plain string.
>>>> For the 'node' id this has already been implemented and released, but it is not reflected in the API yet.
>>>>
>>>> For the API extension they need a new IPPROTO_TIPC socket type which can be registered and instantiated independently from the traditional AF_TIPC socket type.
>>>>
>>>> You can find more info about this at http://tipc.io
>>>>
>>>> Q2: Is the protocol implemented and deployed widely?
>>>> ==========================================
>>>>
>>>> The requester provided the following information when I asked about who was currently using TIPC (pretty much about adoption and deployment):
>>>>
>>>> I can give you a list of current or recently active code contributors and companies/people who have been asking for support:
>>>>
>>>> Huawei:
>>>> For natural reasons I don't know any details about them, I can only name persons I have seen contributing to netdev or being active on our mailing lists. Huawei people sometimes use gmail addresses when posting questions and patches, so there are more persons than I have listed here.
>>>> Dmitry Kolmakov <kolmakov.dmitriy@huawei.com>
>>>> Ji Qin <jiqin.ji@huawei.com>
>>>> Wei Yongjun <weiyongjun1@huawei.com>
>>>> <songshuaishuai2@huawei.com>
>>>> Yue Haibing <yuehaibing@huawei.com>
>>>> Junwei Hu <hujunwei4@huawei.com>
>>>> Jie Liu <liujie165@huawei.com>
>>>> Qiang Ning <ningqiang1@huawei.com>
>>>> Zhiqiang Liu <liuzhiqiang26@huawei.com>
>>>> Miaohe Lin <linmiaohe@huawei.com>
>>>> Wang Wang <wangwang2@huawei.com>
>>>> Kang Zhou <zhoukang7@huawei.com>
>>>> Suanming Mou <mousuanming@huawei.com>
>>>>
>>>> Hu Junwei is the one I see most active at the moment.
>>>>
>>>> Nokia:
>>>> Tommi Rantala <tommi.t.rantala@nokia.com>
>>>>
>>>> Verizon:
>>>> Amar Nv <amar.nv@in..verizon..com>
>>>> Jayaraj Wilson, <jayaraj.wilson@in.verizon.com>
>>>>
>>>> Hewlett Packard Enterprise:
>>>> <jonas.arndt@hpe.com>
>>>>
>>>> WindRiver:
>>>> Ying Xue <ying.xue@windriver.com>
>>>> He is my co-maintainer at netdev ans sourcefoge.
>>>> Windriver has several products in the field based on TIPC, e.g. control system for Sikorsky helicopters.
>>>>
>>>> Orange:
>>>> Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>>>>
>>>> Redhat:
>>>> The person contacting me to have TIPC integrated and maintained in RHEL-8.0 was
>>>> Sirius Rayner-Karlsson <akarlsson@redhat.com>
>>>> He motivated it with a request from "a telco vendor", but I don't know which one.
>>>> Hence, TIPC is now integrated in and officially supported from RHEL 8.1
>>>>
>>>> ABB:
>>>> https://new.abb.com/pl
>>>> Mikolaj K. Chojnacki <mikolaj.k.chojnacki@pl.abb.com>
>>>> Krzysztof Rybak <krzysztof.rybak@pl.abb.com>
>>>>
>>>> Ericsson:
>>>> All (dozens of) applications based on the TSP and Core Middleware/Components Based Architecture (CMW/CBA) platforms is per definition based on TIPC. They have not yet started to use TIPC on their Kubernetes based ADP platform, but there is work ongoing on this.
>>>>
>>>> I also see numerous other people being active, from small (I believe) companies, universities and private contributors. E.g.,
>>>> Innovsys Inc  http://www.innovsys.com/innovsys/
>>>> Allied Telesis https://www.alliedtelesis.com/
>>>> Telaverge Communications http://www.telaverge.com/
>>>> Ivan Serdyuk <local.tourist.kiev@gmail.com> (seems to be responsible for the ZeroMQ port of TIPC)
>>>> John Hopkins University / Fast LTA, Munich <peter.hans.froehlich@gmail.com>
>>>> Just to mention a few...
>>>>
>>>> TIPC is currently maintained jointly by Ericsson, WindRiver, Redhat, and the Australian consulting company DEK Technologies https://www.dektech.com.au/
>>>>
>>>> Thanks
>>>> Suresh
>>>>
>>>> _______________________________________________
>>>> Int-area mailing list
>>>> Int-area@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/int-area
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Int-area mailing list
>>>> Int-area@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/int-area
>>>>
>>>>
>>>> _______________________________________________
>>>> Int-area mailing list
>>>> Int-area@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/int-area