Re: [Taps] New rev of udp-usage (01) and review comments on taps-transport-usage-04

More below.

On 12/05/2017, 16:27, Michael Welzl wrote:
>> On May 12, 2017, at 3:24 PM, Gorry Fairhurst<gorry@erg.abdn.ac.uk>  wrote:
>>
>> See below.
>>
>> On 12/05/2017, 13:31, Michael Welzl wrote:
>>> Hi,
>>>
>>> Thanks a lot for all your comments (plus the nits we authors of the other -usage draft received offline).
>>>
>>> I’ll try to address them all - but there are a two technical questions in this email that made me stop, so I’ll cut all the editorial stuff away and discuss them here - in line below:
>>>
>>>
>>>> - Why do this??? - Isn't it better to set flow labels per interface or for the whole stack, how can any specific transport or application pick unique labels?
>>>> TEXT:
>>>>>    o  Specify IPv6 flow label field
>>>>>       Protocols: SCTP
>>>> (i.e., Is this automatable by the host and a host wide
>>>> configuration?)
>>> Somehow the question seems irrelevant in the context of this draft, which is a list of transport features of protocols. These features are defined in the RFCs spec’ing the protocols - for SCTP, this is defined, and that’s why it’s here.
>>>
>>> We can discuss this for the proposed services that a system should offer, which we try to write up in the minset draft:
>>> I do think that an application should be allowed to assign a label to a TAPS flow (as we call them), which could then map to this function. I mean, isn’t a flow label supposed to identify a transport flow? Then a system-wide configuration wouldn't seem right to me.
>> I think we may disagree. Flow ids identify flows to the network layer, they have no role at the transport layer, and need to be unique (as best they can) for a source address.
> We disagree indeed - in particular about the “unique (as best they can)..” bit. Where is this written??
I'm taking the position of using this as input to an ECMP or LAG hash 
algorithm.
>
>> I much prefer the idea that the Flow id is generated by the IP system, by using a hash - possibly utilising transport data as a part of this hash, and including the protocol number.
> RFC 6437 introduces the flow label as a replacement for the 5-tuple - “possibly utilising transport data as a part of this hash” seems to me to be a very weak requirement here!
OK, which I think is the idea in RFC6438.
> Anyway classifiers in the network wouldn’t work on the flow label alone, but, from RFC 6437, section 2 which is called “specification”:
>    "Packet classifiers can
>     use the triplet of Flow Label, Source Address, and Destination
>     Address fields to identify the flow to which a particular packet
>     belongs."
Yes.
> Then what is the flow label good for, if it’s unique per source address? It doesn’t add any information to this 3-tuple in this case!
Aha - I mean each "microflow" sent from a specific source address should 
be identified by a different and unique flow ID.
>> That seems to be what ECMP is expecting and I suspect ECMP is an improtant use-case.
>>
>> The alternative (if I understand) could be: I could imagine each application could (in theory) be provided with an API to find out what flow-ids are currently being used for each interface it cares about and to then reserve one of the unused IDs for the specific interface(s) that it wishes to use. Then we need to ensure all upper layer entities coordinate on this. To me, this seems over-kill, and the approach taken with ephemeral port assignment is much simpler - the application simply doesn't get involved with choosing the number.
>>
>> Now if what you are saying is that you want the App to somehow signal that it can use an existing flow ID that is in use, and combine data with that flow to get the same network treeatment, I can understand the case. However, that's not exactly the same thing.
> I understand that it would be nice to avoid upper-layer coordination here. However, I see at least two use cases for the application being more in control:
> 1) avoiding fate sharing (encouraging ECMP), e.g. for increased resilience
Yes. Part of the idea here is that microflows (say with the same IPsec 
ESP) can now be separately forwarded if that is what is desired by the 
sending endpoint.
> 2) the opposite: grouping flows, to be able to apply priorities on them, using a mechanism such as the Congestion Manager or https://tools.ietf.org/html/draft-welzl-tcp-ccc
That's the converse of the IPsec ESP example above, and also ok if the 
endpoint wishes this.
> So this is not about giving the application control of the specific flow label number, but allowing it to say “use the same number for these flows” or not.
That's fine with me. Providing it is *NOT* the flow-id, but an input to 
the function that determines the flow-id.
> I think this could nicely be done by letting it number flows, and grouping them via equal numbering - without guaranteeing that these numbers map onto the exact same numbers as a flow label.
OK.
>
>>>> -------------------
>>>> Get Interface MTU is missing from pass 2 and 3:
>>>>
>>>> ADD to pass 2:
>>>>
>>>> 	GET_INTERFACE_MTU.UDP:
>>>> 		Pass 1 primitive: GET_INTERFACE_MTU
>>>> 		Returns: Maximum datagram size (bytes)
>>> But this doesn’t exist!
>> I think I don't understand your comment ... and interpretting low-numbered RFCs is never easy -  I'll use RFC1122 as my basis:
>>
>> RFC 1122 says:
>>        " A host MUST implement a mechanism to allow the transport layer
>>          to learn MMS_S, the maximum transport-layer message size that
>>          may be sent for a given {source, destination, TOS} triplet..."
>>        " and EMTU_S must be less than or equal to the MTU of the network
>>          interface corresponding to the source address of the datagram."
>>
>> TCP handles this for the app.
> … and UDP is another such transport layer. If you try to send a message that’s too large, it throws an error, based on the information it gets via the paragraph you quote above.
> But that’s UDP, not the application on top.
I don't expect each UDP app to have to start by trying to send 64,000B 
and reducing to see what works. Apps need to be able to find this out.

(If you happened to implement host fragementation, any size of send upto 
your fragmentation limit would always return true when writing).

>>>   It’s strictly an IP function and I couldn’t find it described for UDP anywhere. I think we agreed on how a TAPS system should handle this, and this is reflected in
>>> https://tools.ietf.org/html/draft-gjessing-taps-minset-04#section-6.4.1
>>> … which may require a system to implement new local functionality, maybe based on this MTU function - but to my understanding it’s just not something that UDP offers.
>> It's something that a UDP App really needs to pay attention to as per RFC8085 - we may differ on whether you call that "offers" or needs to function. Either way, an app that plans to use any form of PMTUD needs to use this number.
> I agree; and we have put related functions into the minset draft.
Yes we do agree. If you want to redefine that to bytes permitted in the 
UDP payload, I would also be really happy.
> But here we’re describing what it is that UDP itself (not a full-fleged TAPS system) currently offers…
>
And for that, the UDP app must assume either the network-wide default, 
or base this on maths from the MTU info at the network-layer (section 
3.2 of RFC8085). That's part of using UDP.
>> As put in RFC1122:
>>        " A host that does not implement local fragmentation MUST ensure
>>          that the transport layer (for TCP) or the application layer
>>          (for UDP) obtains MMS_S from the IP layer and does not send a
>>          datagram exceeding MMS_S in size.”
> Okaaay, you found an instance of  “ _ or the application layer (for UDP) _ “.    I agree this should be included!  But, this is not the interface MTU.
Maybe it's better to read PMTU, and I'd be fine with a socket (or 
whatever API) retrieving that in place of the Interface-MTU).
>  From RFC 1122:
>
> ***
>           If no local fragmentation
>           is performed, the value of MMS_S will be:
>
>              MMS_S = EMTU_S -<IP header size>
>
>           and EMTU_S must be less than or equal to the MTU of the network
>           interface corresponding to the source address of the datagram. paragraph further above defines MMS_S as the maximum transport-layer m
> ***
>
> So first of all, there’s the “if” - this would only be the value without local fragmentation. Can we assume that there won’t be local fragmentation?
We put the discussion text of host fragmentation in the "DF" discussion 
of our draft.

You could argue that we separate these because "DF" can be set on host 
fragments. (IPv6
> Then, EMTU_S<= interface MTU,
OK
> and MMS_S is even smaller: very reasonably, the IP header size is subtracted.
OK, I'd also be happy to say retrieves MMS_S. (I'm not sure though this 
in practice is implemented ?) -- but does this primitive exist in the 
real world? or do we need to explain both?
> Cheers,
> Michael
>
> _______________________________________________
> Taps mailing list
> Taps@ietf.org
> https://www.ietf.org/mailman/listinfo/taps

Gorry