Re: [mpls] Last Call: <draft-ietf-mpls-in-udp-04.txt> (Encapsulating MPLS in UDP) to Proposed Standard

In message <290E20B455C66743BE178C5C84F1240847E63346C9@EXMB01CMS.surrey.ac.uk>
l.wood@surrey.ac.uk writes:

> Curtis
>  
> http://lmgtfy.com/?q=Jonathan+Stone+CRC+checksum

That is the Sigcomm 2000 paper that a number of people have said is no
longer relevant, in some cases giving quite a bit of detail about the
error causes in the paper and why these will be rare today.  I thought
you were refering to something more recent.

> HDLC is just whatever is over the last hop. You said HDLC, I reused that as
> an example.

I gave HDLC as an example because in 2000 it was in use and a source
of errors but in 2014 it is likely not to be in use just about
anywhere in North America and Europe at least.

> Any link technology could be substituted - 10Mbps Ethernet, say, though
> you'd criticise that as not being 10Gbps Ethernet and therefore out of date.

My point was that if you substituted 10Mbps Ethernet with a 32 bit
FCS, for HDLC which often just counted and passed along errored
packets, you would have far fewer errors, almost none.

> Again, the point is that the link check is not end-to-end, and that errors
> can creep in from the most unexpected places. By analogy with security,
> if I have security across each hop, why would I need security end-to-end?
> I already have it across each hop! Each link is highly and absolutely
> unrbreakably secure! What is this end-to-end of which you speak?

You have security end to end because someone may have a motivation to
monitor or alter your packet.

If you have robust L2 FCS, then there is no one with a motivation to
disable that FCS and corrupt your data.  If they did for MPLS carrying
IP, then there is a check, albiet not a great one, that might catch
it.  If the MPLS payload is PW carrying a L2 carrying IP, same
applies.

Note that if the PW payload is TDM, Ethernet, or most other L2, at
least a checksum if not more is available in the payload so the errors
would be detected and the customer of the PW would complain.  Those
types of complaints have been a complete non-issue.

> If you don't get that point because it's a bit abstract and timeless, that's
> fine. (and if you have to explain the joke, the joke wasn't funny.)
> The link CRC doesn't apply across the entire path; do the maths for
> the path and a series of concatenated links.

The joke is that you didn't realize any mention of HDLC being used
now, just like X.25 being used now, was a joke.  It actually isn't
funny at all, except perhaps until someone doesn't get it.

> [As it happens, I'm familiar with UDP/IP/HDLC internet infrastructure installed
> within the decade and  in daily operational use to deliver imagery from orbit.
> But the paper I wrote on that dates from 2007, so is old and
> won't be of interest to you.]

That is clearly the exception.  TDM is still alive and well in the
third world and in access parts of the developed world that have not
been upgraded but elsewhere alternative exist for terestrial Internet
and TDM is gone from any non-third world network core that I am aware
of.

So don't run MPLS over UDP without a UDP checksum *on that
infrastructure* if you can avoid it.  But for most modern
infrastructures MPLS over UDP without a UDP checksum would be fine.

We are asking for a SHOULD, not a MUST NOT wrt UDP checksum.

> Wait, this thread is all about putting 90s MPLS technology over UDP
> technology specified in 1980. Clearly, if MPLS has to rely on an older
> technology in this way, the MPLS crowd should give up and go home. 

And MPLS carried nothing but IP in those days so from a error
detection standpoint it was and still is a NOOP.

> We've learned repeatedly that zero checksums are a bad idea. IPv6
> RFC2460:
>  
>         Unlike IPv4, when UDP packets are originated by an IPv6 node,
>          the UDP checksum is not optional.  That is, whenever
>          originating a UDP packet, an IPv6 node must compute a UDP
>          checksum over the packet and the pseudo-header, and, if that
>          computation yields a result of zero, it must be changed to hex
>          FFFF for placement in the UDP header.  IPv6 receivers must
>          discard UDP packets containing a zero checksum, and should log
>          the error.
>  
> which RFC6935 simply rewote as inconvenient to tunnelers without
> considering how it affected everything else in the network.

How it affects other everything else in the network should be
considered before ignoring a SHOULD, as is always the case.

> Those that do not learn from history are doomed to repeat it.
> (Unless it's recent history, in which case they're doomed to reject it
> as irrelevant.)
>  
> Lloyd Wood
> http://about.me/lloydwood

Lloyd.  There are a lot of papers that are still relevant.  Almost all
of the causes of errors studied in the 2000 network.  Apparently an
exception is your HDLC infrastructure, and that would be gone too as
an issue if 32 bit CRC is enabled for HDLC and packets dropped on
error, not just counted.

That paper reported:

   2,209 M packets

   468,434 errors
   389,934 ACK-of-FIN bug (old Window NT bug, fixed)
    78,500 remaining
   Other errors cited but often not quantified
     CRLF replacement (thought to be Solaris bug)
     VJHC or other header comprssion bug
     possible PowerMac OS-X bug (fixed by publication)
     router memory error (ECC is used now)
     bad host DMA (PCIe has CRC32 today)

The paper has a breakdown by type of error and discussion of the
causes.  In some cases a particular type of error came mostly from a
small set of hosts but the cause is only likely to be host related
and the entire type of error category can't with certainty be
attributed to host error.

   Bad Hosts

      Another surprise in our traces was the large fraction of errors
      where are due to persistantly-misbehaving hosts.

In fact the paper says:

   In general, link errors should be caught by the CRC.  However there
   are cases where the link level protocol can interact to cause
   higher level checksum errors.  The most notable situation is header
   compression and we looked vigorously for errors of this sort.

Link errors are really not considered in the paper as a significant
source of errors.  Router memory and other hardware errors are cited
but with today's hardware those types of errors should also be gone.

I discussed the sources of errors in this paper previously in
  http://www.ietf.org/mail-archive/web/mpls/current/msg11247.html
You did respond to that but only by top posting and ignoring the
discussion in that email of the causes of errors in the paper.

"No longer relevant" does not mean "not good work".  It would be worth
it for the Stone/Partridge study to be repeated.  For this context it
would have to be with cooperation of a provider and over the type of
infrastructure that MPLS is intended to be run.

Curtis

> ________________________________________
> From: Curtis Villamizar [curtis@ipv6.occnc.com]
> Sent: 15 January 2014 01:42
> To: Wood L  Dr (Electronic Eng)
> Cc: curtis@ipv6.occnc.com; jmh@joelhalpern.com; lars@netapp.com; xuxiaohu@huawei.com; mpls@ietf.org; ietf@ietf.org
> Subject: Re: [mpls] Last Call: <draft-ietf-mpls-in-udp-04.txt> (Encapsulating MPLS in UDP) to Proposed Standard
>  
> In message <290E20B455C66743BE178C5C84F1240847E63346C4@EXMB01CMS.surrey.ac.uk>
> l.wood@surrey.ac.uk writes:
>  
> > The HDLC part here is last link, not the scope of the whole path.  Any
> > 'low' bit error rate given actually becomes quite high once you
> > consider no of bits per packet and line rate...
> >
> > Do read Jonathan Stone's papers on where errors creep in - not just in
> > the link, by at any point along the path, including regeneration
> >
> > Lloyd Wood
>  
>  
> Lloyd,
>  
> There is no HDLC hop.  No one has used HDLC for internet
> infrastructure in ages.  It was a joke, like Scott's comment on
> wanting to use X.25.  HDLC was disappearing when the Stone/Partridge
> Sigcomm 2000 paper was written.
>  
> Links please.  And how old is that paper?  Not another 15 year old
> work is it?
>  
> If you have one bit error per day, how many packets do you lose that
> day?  (hint: one).
>  
> If you have one bit error per day, how many undetactable packet errors
> do you have?  (hint: crc32 gets all one bit errors, therefore zero).
>  
> 10^-12 bit errors is one per 10 second on 100 Gb/s, one per 100 second
> on 10 Gb/s and is generally considered high enough to take a link down
> immediately.  A 1500 byte packet is 12,000 bits, about ~10^4.  That
> would yield a packet rate as high as 10^-8 if bit errors were mostly
> one bit error per packet.  In that case all errors would be
> detectable.  It is only when there are a lot of bit errors or more per
> packet that the CRC can be defeated and then its about 10^-9 chance.
>  
> So at an error rate much less than 10^-8 packets (tightly bunched
> errors with multiple bit errors per packet) some 10^-9 might be
> undetectable with a CRC32.  One packet every 10^6 seconds at 100 Gb/s
> could have an undetectable error.  About one undetectable error a day
> or one a week for continuous full out 100 Gb/s link.
>  
> Note that the same low error rate does not apply to a GbE or 10GbE
> over colored optics over ROADM in the metro since there is no FEC
> there.  It also may not apply to the enterprise or campus Ethernets.
> In those hops the error rate is likely to be higher.  Needless to say,
> wireless hops can have very high error rates.
>  
> This is why it could make sense to have the UDP checksum optional in
> MPLS over UDP.  It wouldn't hurt to provide the checksums but in some
> cases it might be OK to disable them.  That is what SHOULD is for in
> an IETF document.
>  
> Curtis
>  
>  
> > From: Curtis Villamizar [curtis@ipv6.occnc.com]
> > Sent: 14 January 2014 20:54
> > To: Wood L  Dr (Electronic Eng)
> > Cc: jmh@joelhalpern.com; lars@netapp.com; xuxiaohu@huawei.com; mpls@ietf.org; ietf@ietf.org
> > Subject: Re: [mpls] Last Call: <draft-ietf-mpls-in-udp-04.txt> (Encapsulating MPLS in UDP) to Proposed Standard
> >
> > In message <290E20B455C66743BE178C5C84F1240847E63346C3@EXMB01CMS.surrey.ac.uk>
> > l.wood@surrey.ac.uk writes:
> >
> > > It stands to reason that if tunnelers can turn off udp checksums
> > > because their performance is degraded, they can turn off
> > > congestion control because it will degrade their performance.
> > >
> > > Rest of the internet getting congested and getting
> > > misdelivered corrupted packets? Really not their problem.
> > >
> > > There are important vendors trying to sell products here,
> > > and they need performance to do so.
> > > Get with the program!
> > >
> > > Lloyd Wood
> > > http://about.me/lloydwood
> >
> >
> > OK, perhaps if you are running MPLS/UDP/IP over HDLC and the HDLC
> > configuration is set to count FCS errors but not drop you will still
> > *really* need the UDP checksum.  Otherwise its isn't going to do much
> > for you.  Any checksum is really bad for some types of errors such as
> > chunk reordering and multiple bit errors.
> >
> > Maybe on HDLC or PPP with 16 bit CRC you may see a low error rate, but
> > in theory that would be much less than 10^-5 since few multiple bit
> > errors will be coincidence match the CRC, even for a 16 bit CRC.
> >
> > I suspect most routers would be able to do the checksum anyway and for
> > modern links if they come up with a zero error count that's fine.
> >
> > <ot>
> >
> > Modern OTN based transport networks use forward error correction FEC
> > which accounts for a fair amount of overhead and a lot of processing
> > gates on the receiving end.  The measure of effectiveness of given FEC
> > is in dB with 10 dB being a reduction of a factor of 10 in bit errors
> > and typical FEC in the high tens of dB.  The target corrected error
> > rate is often 10^-15 or one bit error in 24 hours for 10 Gb/s, one bit
> > error in 2.5 hours for 100 Gb/s.  Any link with corrected bit error
> > rates approaching 10^-12 is taken out of service.  This is roughly
> > equivalent to the old ES (errored seconds) and SES (severely errored
> > seconds) metric where a ES is one second with any bit errors and an
> > SES is one second with 10 or more errors (I think its 10).  More than
> > some number of ES or SES and a link is taken down.  The uncorrected
> > errors are passed through.
> >
> > A packet may traverse an entire continent with 2-3 such links
> > separated by regeneration or could stop at a number of routers along
> > the way.  Typically today the router uses 10GbE or 100GbE (growing
> > use) which are then passed as a bit stream in the transport network.
> > At the other end the uncorrected errors from transport are picked up
> > by Ethernet 32 bit FCS.  Since a 32 bit FCS picks up 100% of single
> > bit errors and most instances where a small number of bits are in
> > error, and all but 1 in 2^32 where many bits are in error, few errors
> > are going to get through.  If GFP is used, the per packet FCS is
> > checked at each hop and for GFP-T also checked end to end.
> >
> > A bad local ethernet is more likely to contribute an error (again
> > better than 1 in 2^32 detection is expected) due to something like a
> > bad CAT-{5,5e,6} connection or too many sharp turns.  A DSL or DOCSIS
> > link is also more likely to contribute an error.  With CRC32 on all
> > links and no bad hardware in between (ie: circa 1990s equipment with
> > no parity RAM and no correction on DMA, buses, etc) you would expect
> > on the order of 10^-8 errors (10^-9 per hop, a few errored hops).
> >
> > For example, two hosts on my home LAN had non-zero tcp checksums.
> > Each had < 10^-6 packet error rate.  It is hard to tell if this is
> > host errors at the other end.  The only hosts I have with non-zero are
> > on the service provider DMZ LAN so that would include any bot attacks,
> > etc, where sending hosts could be old junk.  Host behind those have
> > zero UDP and TCP checksum errors.  This seems similar to Stewart's
> > quick check.
> >
> > In the T1/T3 days the transport layer just had parity and just counted
> > parity errors.  Providers in those days were notorious for ignoring ES
> > and SES counters until the customer complained.  HDLC then had its 16
> > bit CRC, optional 32 bit.  If an ISP wasn't paying attention to their
> > HDLC error counters then it was up to the IP end customer to complain
> > and hope the problem got escallated rather than dropped.
> >
> > </ot>
> >
> > As to whether congestion control is in practice needed see
> > http://www.ietf.org/mail-archive/web/mpls/current/msg11222.html
> >
> > Its fine to make them both optional and to make congestion control
> > mechanisms out of scope and the topic of a later document if needed.
> >
> > Curtis
> >
> >
> >
> > > ________________________________________
> > > From: ietf [ietf-bounces@ietf.org] On Behalf Of Joel M. Halpern [jmh@joelhalpern.com]
> > > Sent: 10 January 2014 15:36
> > > To: Eggert, Lars; Xuxiaohu
> > > Cc: mpls@ietf.org; IETF
> > > Subject: Re: Last Call: <draft-ietf-mpls-in-udp-04.txt> (Encapsulating MPLS in UDP) to Proposed Standard
> > >
> > > Maybe I am completely missing things, but this looks wrong.
> > > If the MPLS LSP is carrying fixed rate pseudo-wires, adding congestion
> > > control will make it more likely that the service won't work.  Is that
> > > really the goal?
> > >
> > > We do not perform congestion control on MPLS LSPs.
> > > Assuming that a UDP tunnel is carrying just MPLS and was established
> > > just for MPLS, why would we expect it to behave differently than an MPLS
> > > LSP running over the exact same path, carrying the exact same traffic?
> > >
> > > Yours,
> > > Joel
> > >
> > > On 1/10/14 3:47 AM, Eggert, Lars wrote:
> > > > Hi,
> > > >
> > > > that sounds good. What congestion control are you going to be specifying for your tunnel?
> > > >
> > > > Lars
> > > >
> > > > On 2014-1-10, at 4:46, Xuxiaohu <xuxiaohu@huawei.com> wrote:
> > > >
> > > >> Hi Lars,
> > > >>
> > > >> Thanks a lot for your comments.
> > > >>
> > > >> I wonder whether the following modified text for Congestion Consideration section is OK from your point of view:
> > > >>
> > > >> Since the MPLS-in-UDP encapsulation causes MPLS packets to be forwarded through "UDP tunnels", the congestion control guidelines for UDP tunnels as defined in Section 3.1.3 of [RFC5405] SHOULD be followed. Specifically, MPLS can carry a number of different protocols as payloads. When the payload traffic is IP-based and congestion-controlled, the UDP tunnel SHOULD NOT employ its own congestion control mechanism, because congestion losses of tunneled traffic will already trigger an appropriate congestion response at the original senders of the tunneled traffic. When the payload traffic is not known to be IP-based, or is known to be IP-based but not congestion-controlled, the UDP tunnel SHOULD employ an appropriate congestion control mechanism. Furthermore, because UDP tunnels are usually bulk-transfer applications as far as the intermediate routers are concerned, the guidelines as defined in Section 3.1.1 of [RFC5405] SHOULD apply.
> > > >>
> > > >> Best regards,
> > > >> Xiaohu
> > > >>
> > > >>> -----ÓÊ¼þÔ¼þ-----
> > > >>> ·¢¼þÈË: mpls [mailto:mpls-bounces@ietf.org] ´ú±í Eggert, Lars
> > > >>> ·¢ËÍÊ±¼ä: 2014Äê1ÔÂ8ÈÕ 18:22
> > > >>> ÊÕ¼þÈË: IETF
> > > >>> ³ËÍ: mpls@ietf.org
> > > >>> Ö÷Ìâ: Re: [mpls] Last Call: <draft-ietf-mpls-in-udp-04.txt> (Encapsulating MPLS
> > > >>> in UDP) to Proposed Standard
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> On 2014-1-2, at 16:14, The IESG <iesg-secretary@ietf.org> wrote:
> > > >>>> - 'Encapsulating MPLS in UDP'
> > > >>>> <draft-ietf-mpls-in-udp-04.txt> as Proposed Standard
> > > >>>
> > > >>>
> > > >>> this document needs to describe how it addresses the issues raised in BCP145
> > > >>> (RFC5405). It already contains some text about messages sizes and congestion
> > > >>> considerations, which is great. Unfortunately, the text about congestion
> > > >>> considerations is not fully in line with RFC5405.
> > > >>>
> > > >>> Lars
> > > >
> > > _______________________________________________
> > > mpls mailing list
> > > mpls@ietf.org
> > > https://www.ietf.org/mailman/listinfo/mpls