Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol

Jon Maloy <jmaloy@redhat.com> Mon, 23 March 2020 18:40 UTC

To: Joseph Touch <touch@strayalpha.com>
Cc: int-area <int-area@ietf.org>, Suresh Krishnan <suresh@kaloom.com>
References: <DC440B28-DA08-499F-8A2A-7A8ACF880724@kaloom.com> <A6B82786-FB50-4AAA-8D69-0A55FEB5DC3B@strayalpha.com> <4bad2d30-0220-a836-451d-b01fdba4d098@redhat.com> <0C774D74-89A9-44CB-BCE7-A0ACC138C10F@strayalpha.com> <4cd43b9b-f7fa-0fc5-3ba9-11a735268288@redhat.com> <BAAD573B-497C-4F86-AF7A-776781698717@strayalpha.com> <eb054946-0bbe-ce6b-3a7d-6e2630ae4c6f@redhat.com> <E206BEE8-C157-4733-924F-649C94321E03@strayalpha.com> <ab1de07e-6284-5fe6-ef0d-46303f996354@redhat.com> <6BD67898-E2AB-4628-9A0D-4AEAC790EFA0@strayalpha.com>
From: Jon Maloy <jmaloy@redhat.com>
Message-ID: <e3ce1ee0-d862-dd3e-2dc6-8b2bbf92e1b4@redhat.com>
Date: Mon, 23 Mar 2020 14:40:37 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0
MIME-Version: 1.0
In-Reply-To: <6BD67898-E2AB-4628-9A0D-4AEAC790EFA0@strayalpha.com>
Content-Language: en-US
Content-Type: multipart/alternative; boundary="------------E3EDCE0C1705075120DCCA27"
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/xTpTZYreTaN_H-ELTyWKuDfhDgs>
Subject: Re: [Int-area] IP Protocol number allocation request for Transparent Inter Process Communication (TIPC) protocol
Precedence: list


On 3/23/20 11:41 AM, Joseph Touch wrote:
> Jon,
>
> First, if you’re going to come to the IETF asking for something as 
> core as an IP protocol number, you need to be able to explain your 
> system to us in our terms.
>
> That means explaining things below in the following terms:
> UDP/TCP and nearly anything over IP = transport
> IP = network
> Ethernet = link
>
> UDP/IP isn’t a link layer to us;

It isn't for us either. It is a transport for our own link layer, just 
like VxLAN and some other protoctols are using it.

> what you’re really asking for, FWIW, is to be a *transport protocol*, 
> but that’s not quite what you want either (see below).
>
> Second, if you want an IP protocol number, your system has to “buy-in” 
> to the IP model in which IP unicast addresses are endpoints, not 
> logical identifiers.
>
>>> ...
>>> Type in www.google.com <http://www.google.com/>
>>>
>>> Now type in its IPv6 address.
>>>
>>> Now see if you remember google’s website DNS or its IPv6 address. 
>>> That’s what the DNS was originally intended for.
>> Yes. But in this case also demonstrates that both DNS names and the 
>> IP address may be location independent. We have no clue whether a 
>> call will end up in a server farm in the US or Europe, let alone 
>> which server it will be handled on. So, even though the original 
>> purpose of DNS may have been something else, it has clearly followed 
>> the obvious path of becoming a tool for location independence. This 
>> is good, but not good enough for our purposes.
>
> Please be more specific in what you’re seeking then.

Once more: location transparency at the user/socket level.

>
>>>
>>>>> DNS names are no more or less location-independent than IP addresses.
>>>>>
>>>>> This is also why DNS was invented...
>>>>>
>>>>> False. The reason the DNS exists has nothing to do with location. 
>>>>> It’s simply string substitution for convenience, or at least was 
>>>>> ONLY that originally.
>>>>
>>>> I think you just supported my case for a location independent 
>>>> addressing scheme.
>>>
>>> I am - but then I’m baffled why you want to run direct over IP. 
>>> Ethernet has location independent addresses; IP does not* (see next 
>>> part).
>>
>> When I am talking about location independence I am always talking 
>> about what the socket programmer/user sees.
>
> IP isn’t about that. It’s about what the network sees.

True. But that is IP.
*TIPC* is about location independence at the socket level. This exists, 
and will continue to exist independent of whether we are using IP or 
something else.

>
>> We don't want him to handle IP addresses, and we probably don't want 
>> him to hard code DNS names either.
>
> Please clarify - do you want to hard-code anything? Or have the user 
> type it in?

Programmers typically hard code the service type, while the service 
instance more typically is configured or calculated based on 
configuration data.

>
>> But, at some level further down in the stack we never get around 
>> translating location independent addresses to some form of location 
>> dependent ditto in order to transmit the packets to the right node 
>> and socket. Be it MAC, IPc4, IPv6 or anything else.
>>
>> This is what we do in TIPC :
>>
>> Socket Layer:            {service type, service 
>> instance}                 {port number}
>
> The Internet uses service names for that (e.g., HTTP, HTTPS, etc).
>
> If service name lookup over the Internet using DNS is too slow, then 
> replace it with a different lookup mechanism or implementation.

That is what we have done.

> But it’s still DNS and DNS SRV records equivalent at that point.
>
>> ------------------ | A
>> v |
>> TIPC Binding Table:  {port number, node 
>> number}                                   |
>
> Please explain what a node number is...
>
>> ------------------------- | |
>> v |
>> TIPC Link Layer:            {UDP port, IP 
>> address}                       {UDP port, IP address}
>> -----------------------             or {MAC 
>> address}                                or {MAC address}
>
> How is a UDP port different from your port?

It is the endpoint for the node-to-node transport, only. There is one or 
two of those per node pair.
The TIPC port is a 32-bit endpoint identifying the socket that 
terminates a service, bound by a {service type, service instance} tuplet.
>
> How is a node number different fro your number?

A node number is just a 32 bit identifier for a node. It could nowadays 
be mapped to an IPv4 address, if any such is available, but since we are 
also running directly on Ethernet, that is not the way it is done now.

>
>> The {UDP port, IP address} tuple (or MAC address) at the link layer 
>> are never visible to the user,
>
> That’s how Internet protocols already work...

Then I guess you see this as a correct level of abstraction, just like I do.

>
>> and may change on-the-fly without him ever noticing.
>
> That’s where you lose me. You want IP, but this isn’t IP. This is 
> Ethernet, at least as I uses it.

It is a transport for our link layer. If we have two transport channels 
between two nodes, and one of them fail, the remaining one will take 
over all traffic. When the lost channel comes back, it may theoretically 
have a new IP address, but normally not. Anyway, this is a detail has 
nothing to do with our request.

>
>> The same is true for the {port number, node number} tuple,
>
> Why?

Another design decision which has nothing to do with our application. In 
some cases it is practical and more efficient to use this address type 
when setting up a connections.

>
> If everything in your system changes on the fly, what stays the same?

Service addresses.

>
>> although the user here has the option to use those directly, at the 
>> expense of location transparency.
>> So, our request is simply about enabling us to use a third mapping at 
>> the link layer, an IP address only. This does not in any way 
>> interfere with the location transparency that is already provided at 
>> the socket level.
>
> My point is that you’re not showing us how this helps. You simply want 
> something - I understand that. But you have to show you NEED it. 
> Everything you’re saying are reasons why you actually don’t want or 
> need it.
>
> Further, let’s say you get an IP protocol number. Why wouldn’t that be 
> among the many things here that needs to “change on the fly” too?

???

>
>>
>>>
>>>> This was one of the original motivations for developing TIPC in the 
>>>> first place.  A programmer using TIPC can hard code his service 
>>>> addresses if he wants to, ignoring the number of or location of the 
>>>> corresponding endpoints, even as those move around or scale up/down 
>>>> quite fast.
>>>
>>> Anycast gives you location independent addresses at the cost of 
>>> doing discovery “inside the network layer”.
>>
>> Yes, and that is what we do. But for this to be of any use, that 
>> discovery/translation has to be blistering fast, and that is also 
>> what we do.
>
> You don’t need an IP protocol number for that….

I never said that. But you were originally questioning our use of an 
internal lookup service, that I why I mentioned this.
>
>>
>>>
>>> However, even if you have those addresses, you still need to 
>>> identify the service types (which is what we use ports for).
>>
>> UDP (at the link level) has only one service type in this case: “TIPC"
>
> That’s an identifier for your service - you can easily add whatever 
> additional identifiers you want inside that and demux to support 
> dozens or even billions of different sub-services.

That is exactly what we are doing.

>
>> At the socket level we are using TIPC service addresses for this, 
>> i.e., a {service type, service instance} tuple, each element being a 
>> 32-bit integer.
>
> That, IMO, is an ID that belongs *inside* UDP port TIPC. That is YOUR 
> service type/instance, not the Internet’s. The Internet should 
> consider this all a single TIPC service.

Yes. That why we are asking for a protocol number.

>
>>>
>>> ——
>>>
>>> I’m still stuck at why you want to run direct over IP. If you want 
>>> Ethernet that bridges across routers, GRE does that.
>>
>> Yes, we could use VxLAN or Geneve or whatever. But that always comes 
>> to a cost both in performance and maintenance.
>
> I can’t speak for IP protocol numbers, but Internet transport port 
> numbers are not assigned or performance reasons.
>
>> We want TIPC to be both performant and really simple to use.
>
> You seem to have a lot of competing goals. You should consider the 
> rule of home contractors - fast, cheap, good - pick two. The same 
> applies to nearly all systems design decisions.

True. This has always been a conflict, and still I think we have, after 
a lot of effort, succeeded quite well in combining all three.

>
>>
>>> If you want loc-independent addresses for services, UDP over IP 
>>> using anycast does that.
>>
>> Again yes, but IP is normally not location independent inside 
>> clusters. 8.8.8.8 may be perceived as location independent, but 
>> 192.168.100.17 is typically not. And UDP has well-known limitations:
>>
>> 1) - UDP has 16-bit port numbers, a number space which has to be 
>> strictly managed.
>>     - TIPC has a 32-bit+32-bit service address instead. This is what 
>> we want
>>       to extend to 128+128 bits, so that nobody ever needs to register a
>>       well-known address for TIPC. At least not for the purpose of
>>       avoiding collisions.
>
> What you want, IMO, is a field t the front of a UDP TIPC port packet. 
> YOUR service IDs are not the Internet transport port services; they’re 
> components of what the Internet architecture considers “the 
> application layer’ (which is merely whatever runs over UDP/TCP/SCTP/DCCP).
Why not UDP/TCP/SCTP/DCCP/TIPC?
>
>> 2) - UDP is best effort.
>>     - Standard TIPC anycast is "better than best" effort, because 
>> packets will
>>       never be lost in transport. Due to lack of socket level flow 
>> control, there
>>       is still a risk of seeing messages being dropped, though.
>>     - Group anycast DOES have end-to-end flow control, so such messages
>>       will never be lost or disordered.
>
> Raw UDP isn’t what you seek, so do what you want *over* UDP. Nothing 
> stops you and nothing makes your protocol only offer only what UDP 
> has. E.g., see QUIC.

Functionally there is no showstopper.  But we want TIPC to be comparable 
or better than TCP even regarding performance.

>
>> 3) Furthermore, we have reliable multicast and broadcast using the same
>>     address type. There is no way you can get that with UDP.
>
> See my response to #2.
>
>>>
>>> What is the specific gain of needing IP but not allowing a 
>>> transport? AFAICT, it’s all down to GSO - which is an 
>>> implementation. If GSO doesn’t do what you want, it would be useful 
>>> to take your issues there or edit the code yourself and submit the 
>>> patches.
>>
>> In that respect this is only an implementation issue, as you say, but 
>> it is not a TIPC only one.
>
> Perhaps, but you’re the only one asking for a new IP protocol number 
> to solve it.
So be it.
>
>> The slides referred to me by Tom Herbert describe GSO on large UDP 
>> messages, but they don´t describe how we go one step further and do 
>> it on the inner messages, or how we identify those as being TIPC in 
>> the first place. Furthermore, we would have to re-write the host 
>> level GSO support, which am highly uncertain that the Linux network 
>> community would accept, given that everything needed already is there 
>> (i.e., if we only have a proper protocol number.)
>
> So let me get this straight:
>
> - you want an IP protocol number, a limited resource of the entire 
> global Internet
> - because you’re concerned that Linux won’t take your code?
>
> I strongly suggest trying that first and if it fails, then perhaps 
> make your own Linux release or patch.
>
> I.e., this is not an Internet protocol problem.
>
>> GSO is only one of the reasons for our request. There are more reasons:
>> - Performance. The difference is not dramatic, but clearly measurable.
>>   Terminating sockets in kernel space comes at a cost.
>
> That’s an implementation issue...
>
>> - The need to be able to register a new socket type, which will map down
>>   to a (compatible) TIPC v3 protocol.
>
> That’s another implementation issue.
>
>> - Acceptance. We want to have TIPC recognized as a part of the IP 
>> protocol
>>   family, controlled by IETF, like most other protocols.
>
> It already is, but it’s recognized for what it is to the Internet 
> protocol family - a service, not a transport.
>
> Also, FWIW, making it be its own transport will only ensure it won’t 
> get through most firewalls.

See my response to Tom regarding this.

///jon

>
> Joe

[Int-area] IP Protocol number allocation request … Suresh Krishnan
Re: [Int-area] IP Protocol number allocation requ… Joseph Touch
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Tom Herbert
Re: [Int-area] IP Protocol number allocation requ… Joseph Touch
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Tom Herbert
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Joseph Touch
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Tom Herbert
Re: [Int-area] IP Protocol number allocation requ… Joseph Touch
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Jon Maloy
Re: [Int-area] IP Protocol number allocation requ… Joseph Touch
[Int-area] Concluding call for comment (Was Re: I… Suresh Krishnan