Re: [Softwires] [Int-area] [nvo3] Is it feasible to perform fragmentation on UDP encapsulated packets.

On Fri, Jun 3, 2016 at 12:38 AM, Xuxiaohu <xuxiaohu@huawei.com> wrote:
>
>
>> -----Original Message-----
>> From: Joe Touch [mailto:touch@isi.edu]
>> Sent: Friday, June 03, 2016 12:34 PM
>> To: Xuxiaohu; otroan@employees.org
>> Cc: Softwires WG; nvo3@ietf.org; int-area@ietf.org; lisp@ietf.org;
>> tsvwg@ietf.org
>> Subject: Re: [nvo3] [Softwires] Is it feasible to perform fragmentation on UDP
>> encapsulated packets.
>>
>>
>>
>> On 6/2/2016 7:53 PM, Xuxiaohu wrote:
>> >
>> >> -----Original Message-----
>> >> From: Joe Touch [mailto:touch@isi.edu]
>> >> Sent: Friday, June 03, 2016 2:08 AM
>> >> To: otroan@employees.org; Xuxiaohu
>> >> Cc: Softwires WG; nvo3@ietf.org; int-area@ietf.org; lisp@ietf.org;
>> >> tsvwg@ietf.org
>> >> Subject: Re: [nvo3] [Softwires] Is it feasible to perform
>> >> fragmentation on UDP encapsulated packets.
>> >>
>> >>
>> >>
>> >> On 5/27/2016 3:50 AM, otroan@employees.org wrote:
>> >>> It is not possible to implement reassembly complying with IETF RFCs.
>> >> a) ATM does this at ridiculously high fragment rates. Granted IP
>> >> frags can come out of order, but the fragments are generally much larger.
>> > As pointed in RFC4459,
>> >
>> >      "At the time of reassembly, all the information (i.e., all the
>> >       fragments) is normally not available; when the first fragment
>> >       arrives to be reassembled, a buffer of the maximum possible size
>> >       may have to be allocated because the total length of the
>> >       reassembled datagram is not known at that time. Furthermore, as
>> >       fragments might get lost, or be reordered or delayed, the
>> >       reassembly engine has to wait with the partial packet for some
>> >       time (e.g., 60 seconds [9]).  When this would have to be done at
>> >       the line rate, with, for example 10 Gbit/s speed, the length of
>> >       the buffers that reassembly might require would be prohibitive. "
>>
>> Yes, but the alternative of declaring that you don't reassemble has a cost in
>> terms of dropped segments too.
>
> The alternative is to configure the MTU of the core large enough to accommodate the added encapsulation header. This is a feasible and proven approach in well-managed SP networks. No packet loss and no forwarding performance degradation.
>
>> Taking that drop into account, buffering for a smaller amount of potential
>> reordering and accounting for reasonable reassembly sizes (for IPv6, the
>> smallest "max" that's required is 1500) need not be prohibitive.
>
> In the IPv6-in-IPv4 Software network case, assume the MTU of the core is not large enough than 1280+20, all the IPv6 traffic across the tunnels would have to be fragmented and then reassembled. Not a small amount.
>
>> > Have you heard the wide adoption of 622M (STM-1) beyond ATM interfaces
>> between ATM switches in the previous ATM networks? If not, the highest
>> non-ATM interface in the past ATM networks was the FE interface which is 100M
>> bps (around the year of 2000), If I remembered it correctly.
>>
>> OC12 ATM NICs were commonly available in 1998. That was back when
>
> Sorry, there was a spelling mistake ( s/STM-1/STM-4).
>
> OC12/STM-4=622M. My question is have you heard the wide adoption of 622M (STM-4) BEYOND ATM interfaces (e.g., STM-16/OC48) at that time?
>
>> ethernet was pushing 100M and the only other near-gigabit tech was Myricom
>> (a spin-off of USC/ISI and Caltech). I.e., with the tech available at that time,
>> 600Mbps SAR was possible.
>
> Was that 600Mbps SAR capability proved in real networks? And for what service?
>
>> > Furthermore, have you heard the reordering issue with ATM cells? If no, that
>> means once an ATM cell of a given ALL PDU gets lost, all the received ATM cells
>> of that AAL PDU would be dropped and therefore the reassembly buffer for that
>> AAL datagram would be released. In other words, there is no need to wait for
>> the lost or reordered fragment for a certain period of time before releasing the
>> reassembly buffer.
>> Reordered no, but just because they arrive in order does not mean they are
>> required to arrive *adjacent*. The same requirement applies in terms of
>> needing reassembly buffers for a number of interleaved segments.
>
> However, the reassembly buffer requirement is limited due to the fact that only one buffer is needed per VC and the total number of VCs is not large on the edge of ATM networks. In contrast, the number of IP flows is uncontrollable.
>
>> >  Such behavior is not possible for IP fragmentation and reassembly. Last but
>> not least, there is no need to assign a reassembly buffer per AAL PDU (as
>> opposed to per IP datagram in the IP fragmentation and reassembly case),
>> instead, only one reassembly buffer per VC since all cells within a given VC would
>> be received in order. Since the SAR is only needed on the edge of the ATM
>> networks, the number of VCs is very limited.
>> You could easily decide to allocate a fixed number of reassembly buffers per IP
>> flow to conserve resources too. The amount of reordering, together with the
>
> What about in the DDoS attack case? Do you believe that SPs would allow the existence of that obvious DDoS attack risk in their networks and afford the significant forwarding performance degradation? especially when there has been a feasible and proven solution to the MTU issue (i.e., configure the MTU of the core large enough)?
>
That is precisely solution #3 in RFC4459:

"Ensure that in the specific environment, the encapsulated packets
will fit in all the paths in the network, e.g., by using MTU bigger
than 1500 in the backbone used for encapsulation."

If you're able to implement this in you network that's great, but it
is not feasible in all situations and hence can't be the standard. We
may not control all the underlay networks (definitely not over the
Internet) and hence not be able to set all the MTUs large enough. Also
an MTU greater than 1500 require jumbo frames, not all networks
enabled those.

One other point, when a device sources or sinks IP packets like in
tunneling it is taking on the role of a host and so we expect host
requirements apply. For instance, routers may perform fragmentation in
IPv4, but they do not perform fragmentation for IPv6 or reassembly
(those are hosts functionalities). There has been at least one attempt
to carve out exceptions in applying host requirements to devices that
perform tunneling, that is an allowance to send a zero UDP checksum in
in IPv6. Documenting this took two RFCs and we still needed a whole
lot of description in at least GRE/UDP on requirements that to use the
non-zero checksum. I don't think standardizing exceptions in
fragmentation for tunnels would be any easier.

Tom

> Best regards,
> Xiaohu
>
>> number of flows supported would determine your ability to avoid packet drops
>> -- but recall that drops at high load are not unusual for IP.
>>
>> Joe
>
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area