[Teas] Re: Comments abount rfc2205 Resource ReSerVation Protocol (RSVP)

Tuấn Anh Vũ <anhvt.hdg@gmail.com> Fri, 25 October 2024 07:13 UTC

Return-Path: <anhvt.hdg@gmail.com>
X-Original-To: teas@ietfa.amsl.com
Delivered-To: teas@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B9F0BC1CAE6F for <teas@ietfa.amsl.com>; Fri, 25 Oct 2024 00:13:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.102
X-Spam-Level:
X-Spam-Status: No, score=-2.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DC_PNG_UNO_LARGO=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6oMROjKpDFvo for <teas@ietfa.amsl.com>; Fri, 25 Oct 2024 00:13:51 -0700 (PDT)
Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 466B0C1CAE6D for <teas@ietf.org>; Fri, 25 Oct 2024 00:13:51 -0700 (PDT)
Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-a9a2cdc6f0cso228699266b.2 for <teas@ietf.org>; Fri, 25 Oct 2024 00:13:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729840429; x=1730445229; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hcjAqB4qBROzodcNzkiEidsx70EE1xBRP0w9mBDxMOs=; b=PNKLYYIyFVxzE87Oj5XAwUfhoTIwklIJ3bjuzgTGaxb4vmo2nr2ndHQEoyNDkzb4Rq +CjiOeEizUInvqQkelwl0Z+nnB19J6FBkTI9Nl0zKr37hXRq3rwlUSX+oXwpdBle5wla 8Nr85O9WtUkwrMlEHTMhcpqeLirp/AciI/XYBHbK7dI59JZHVh0FgjIlhE5RLMVkvvhl v9yPKEK8qEFUR03BH2Yd3LGhueZd6B1aqPchOBOtUzh2Zs6ITcszul5UtgHECsnYI3c9 DSXTckodmOFwAx1jjj2J2pjf3JW1XIRhyWgPwlayaR7u0GrfFTD8m/ZClpE1jxcswvU7 r6iw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729840429; x=1730445229; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hcjAqB4qBROzodcNzkiEidsx70EE1xBRP0w9mBDxMOs=; b=miruf1Tncr/TuU/2TFo/2yxPgDxk1B6nX3N97hpEJQCW6JOtb1ZrabOxLPHQ2ZeHr0 roIoXsfFHqdsafo38Ag+8WJoISmFLXgKcS0F2L/WVDkrrkIkK2uqejNSO8ek6W4ZV4SZ 2FUPMfKKc7/Sv8Zl9tlNkKsfaT6C5PNA4RPKneMXzOwMtekbDuEBmRpMr68jOtIzMknt lCtTkXsujRcUN6wPqOs3LJ3V0URRvMzaIxuqqsdPHNIazOqZ1F7r8z+CgIirTrjhsm3b IRBqlutzlN3yJi+ODEZU362tVGmL3z4BciH56nkcRdkPtUVmzN9FVSDVeDPZWTZvV40A xDnw==
X-Forwarded-Encrypted: i=1; AJvYcCX0+rP+RvzuFl7mCMIuaVLxU0eIellX50YdUAkKOHfQvPcM/oliNG7PbbAVa+6X8aPTWEmy@ietf.org
X-Gm-Message-State: AOJu0YxghHGyk66zIrzxvduU5wsPlYUMpr2/Fy6uwLgMr0KQ/JRebrys SHAYiDeyLBwgYVKHN+JohoYchdSLTqvnCWHYc6KWaQc/SWzdUDQ5/erNll4yOSVR62IYf72Z6RT Ys4uIZdEInt2muZRspzrGnHxKSws=
X-Google-Smtp-Source: AGHT+IHejnIBZtOR5/Mb+sOS7veD8cYA+KP/FikOP4F8DQvRAyyd4Ldw6RlmoRiy/CyThB8/n4vtkJAT6ywUBRZhHxc=
X-Received: by 2002:a17:907:3e91:b0:a9a:4eac:a2a5 with SMTP id a640c23a62f3a-a9abf96bc77mr717871766b.63.1729840428730; Fri, 25 Oct 2024 00:13:48 -0700 (PDT)
MIME-Version: 1.0
References: <CA+SXWCnrL-0AbHKJo_k0RNhVP-maJQqkwfdaZfx4wo82eKYO=w@mail.gmail.com> <007301db2471$b5124760$1f36d620$@olddog.co.uk> <CA+YzgTt3pAwxUs+ZQmeyN934kVWt-tvpo5=ZinxV2We_k6MnSw@mail.gmail.com> <CA+SXWCk9V22W=_WtDWiqZX3E9CmjD98sN+W734rtrX-jr0Qstg@mail.gmail.com> <CA+YzgTu=oiR2vAL15T11P=YU0CO45+322fObC14KGSZn8biKHw@mail.gmail.com>
In-Reply-To: <CA+YzgTu=oiR2vAL15T11P=YU0CO45+322fObC14KGSZn8biKHw@mail.gmail.com>
From: Tuấn Anh Vũ <anhvt.hdg@gmail.com>
Date: Fri, 25 Oct 2024 14:13:37 +0700
Message-ID: <CA+SXWCnRi6Xn8v+ePO0XddoAGYNOCkEZhakZ6VMGCDaYL0_r_w@mail.gmail.com>
To: Vishnu Pavan Beeram <vishnupavan@gmail.com>
Content-Type: multipart/related; boundary="00000000000045eee9062547dc5f"
Message-ID-Hash: 2DADD6ZHKAOW375XLYXC3HLTO24R23WJ
X-Message-ID-Hash: 2DADD6ZHKAOW375XLYXC3HLTO24R23WJ
X-MailFrom: anhvt.hdg@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-teas.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: adrian@olddog.co.uk, TEAS WG <teas@ietf.org>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Teas] Re: Comments abount rfc2205 Resource ReSerVation Protocol (RSVP)
List-Id: Traffic Engineering Architecture and Signaling working group discussion list <teas.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/teas/kWKBejEC6S7YwdJgqcnwEueKKjw>
List-Archive: <https://mailarchive.ietf.org/arch/browse/teas>
List-Help: <mailto:teas-request@ietf.org?subject=help>
List-Owner: <mailto:teas-owner@ietf.org>
List-Post: <mailto:teas@ietf.org>
List-Subscribe: <mailto:teas-join@ietf.org>
List-Unsubscribe: <mailto:teas-leave@ietf.org>

Hi Pavan,

*Since R2 is (oblivious of the link flap on R3 and) still maintaining
state, it should eventually refresh the path state which would result in R3
and R4 re-instantiating the LSP state. When the Resv message with the new
label reaches R2, it would delete the existing reservation state (send a
ResvTear) and recreate the state with the new label (and send a Resv
upstream). And R1 would also delete and add reservation state as part of
the ResvTear and Resv processing (and the LSP would be deemed healthy
again). Is this not happening in your setup?*

* AnhVT:* It takes a long time to clean up the reservation state based on
timeout or refresh message. This is unacceptable for ISP services.


*If you use RSVP Hellos per neighbor interface, R3 could update the
instance value when the link flaps. And this change in the instance value
can also prompt R2 to refresh state immediately (even without any Graceful
Restart procedures coming into play) resulting in R3 and R4
re-instantiating state again.  *

*AnhVT: *RSVP direct neighbor (link neighbor) is one mechanism to bring
down LSP when the RSVP direct neighbor changes the state from UP to DOWN.
In my case, RSVP direct neighbor hadn't up before that because the AE was
flapping many times before the stuck issue happened. Every time the AE
between R2-R3 is up again, it takes 1-9s to bring up the RSVP direct
neighbor (hello timer is 9s). Before the issue that I described in the
first email (the LACP flap on R3 but not R2), the AE was flapping for some
seconds (less than 9s) -->  RSVP direct neighbor hadn't up yet.


Regards,

AnhVT

Vào Th 6, 25 thg 10, 2024 vào lúc 13:11 Vishnu Pavan Beeram <
vishnupavan@gmail.com> đã viết:

> AnhVT, Hi!
>
> Thanks for the additional details. The key detail that was missing in your
> earlier email was that the link comes back up immediately on R3.
>
>
>
> Since R2 is (oblivious of the link flap on R3 and) still maintaining
> state, it should eventually refresh the path state which would result in R3
> and R4 re-instantiating the LSP state. When the Resv message with the new
> label reaches R2, it would delete the existing reservation state (send a
> ResvTear) and recreate the state with the new label (and send a Resv
> upstream). And R1 would also delete and add reservation state as part of
> the ResvTear and Resv processing (and the LSP would be deemed healthy
> again). Is this not happening in your setup?
>
>
>
> If you use RSVP Hellos per neighbor interface, R3 could update the
> instance value when the link flaps. And this change in the instance value
> can also prompt R2 to refresh state immediately (even without any Graceful
> Restart procedures coming into play) resulting in R3 and R4
> re-instantiating state again.
>
>
> Hope this helps.
>
>
> Regards,
>
> -Pavan
>
> On Thu, Oct 24, 2024 at 1:31 PM Tuấn Anh Vũ <anhvt.hdg@gmail.com> wrote:
>
>> Hi,
>> Thanks for all your answers, please find my view below:
>> *I./ Hi Adrian:*
>>
>> *There are two questions that arise…*
>>
>>    1. *Why isn’t R2 able to notice? Presumably the link failure
>>    detection is relying on a lower layer (L2 or L1) failure indication, and
>>    that is not happening. The answer to this is to run some other link failure
>>    detection mechanism such as BFD.*
>>
>> *Such a mechanism would allow R2 to declare the link down and possibly
>> re-route/repair the LSP via R5, or notify the head end (R1) to let it
>> re-route.*
>>
>> *AnhVT:* We use both LACP (1sx3) and micro BFD(300msx3) for this link
>> but some how LACP timeout on R3 first (this link is using
>> Transmission System, so the physical link is not down) and that triggers
>> micro BFD sends Admin down notify to remote (R2). This notification brings
>> down micro BFD on R2 but not LACP client. This is expected behavior noticed
>> in BFD RFC. After 2s, the link is stable again. Because of that, R2 does
>> not know LACP was flapped on R3 side.
>>
>>    2. *How could R3 let R2 know that the LSP has been torn down? The
>>    answer is “by sending a PathErr or ResvTear or Notification”. In general,
>>    those messages are sent hop by hop, and so they would fail to be routed on
>>    the failed link R3-R2, however, it is possible to IP-tunnel to
>>    direct-address RSVP packets so that they would be IP-routed to R2 (or
>>    direct to R1) via R5.*
>>
>>           * AnhVT: *Could you give me the document relative to this
>> behavior? I read some RSVP RFC but I don't observe this behavior.
>> *II./ Hi Pavan:*
>>
>>    - *Stale state cleanup based on soft state time out (RFC2205): Since
>>    the link is down, the reservation state isn’t getting refreshed. So, when
>>    the reservation state times out (in about 157.5 secs for a refresh-interval
>>    of 30 secs), R2 is expected to clean-up the reservation state and signal a
>>    ResvTear to the ingress*
>>
>>           *AnhVT: *It takes a long time to clean up the reservation
>> state based on timeout. More than 2 minutes of blackhole traffic. This is
>> unacceptable for ISP services.
>>
>>
>>    - Use of RSVP Hello Session based on the Node-ID (RFC4558) for
>>    detection of RSVP-TE signaling adjacency failure: If there was an RSVP
>>    Hello session maintained between R2 and R3, R2 would be able to couple the
>>    state of the LSP with the state of signaling adjacency. And when the
>>    signaling adjacency failure is detected (Hello State timed out -- for a 9
>>    sec hello interval, the time out takes 31.5 secs), R2 would clean up the
>>    reservation state and signal a ResvTear to the ingress. This option can be
>>    used to clean up stale state when long refresh intervals are used.
>>    - *AnhVT: *The LACP interface on R3 just flaps for 2s, so the
>>    node-hello can not work in this case. And 31.5 s is a long time too. I
>>    think we need some way faster. R3 should send Reserve Tear to R2 through
>>    IGP link (R3->R5->R2).
>>
>>
>> *III./ Hi Tarek*
>> *RSVP PathErr can be used to propagate errors upstream – there’s
>> Path_State_Removed that RFC3473 introduced to also notify that state has
>> been removed. Does that address your need?*
>> *AnhVT: *Let me check R3 send PathErr to R2 or not. As I know, R2 will
>> not bring down LSP even if R2 receives PathErr from R3. Base on
>> https://datatracker.ietf.org/doc/html/rfc2209
>> [image: image.png]
>>
>> Regards,
>> AnhVT
>>
>>
>>
>> Vào Th 4, 23 thg 10, 2024 vào lúc 01:23 Vishnu Pavan Beeram <
>> vishnupavan@gmail.com> đã viết:
>>
>>> AnhVT, Hi!
>>>
>>>
>>>
>>> Since you are referring to an LSP in an IP/MPLS network, I’m assuming
>>> that you are using in-band RSVP signaling. I’m also assuming that this is
>>> an LSP that does not have any form of local-protection enabled.
>>>
>>>
>>>
>>> When R3 detects an upstream link-down event, it cleans up the local path
>>> state and sends a PathTear downstream -- in this scenario, the onus is not
>>> on R3 to notify the ingress of this outage. The typical expected behavior
>>> on R2 is to detect the downstream link-down event and send a PathErr to the
>>> ingress (signaled hop-by-hop) of the LSP. R2 would also clean-up the
>>> reservation state and send a ResvTear to the ingress (again, signaled
>>> hop-by-hop). If R2 is not able to detect the link-down event for some
>>> reason (and no other link state detection mechanism like BFD is available),
>>> there are a couple of control-plane options that RSVP already provides to
>>> clean up state (in due course of time) and bring down the LSP:
>>>
>>>    - Stale state cleanup based on soft state time out (RFC2205): Since
>>>    the link is down, the reservation state isn’t getting refreshed. So, when
>>>    the reservation state times out (in about 157.5 secs for a refresh-interval
>>>    of 30 secs), R2 is expected to clean-up the reservation state and signal a
>>>    ResvTear to the ingress
>>>    - Use of RSVP Hello Session based on the Node-ID (RFC4558) for
>>>    detection of RSVP-TE signaling adjacency failure: If there was an RSVP
>>>    Hello session maintained between R2 and R3, R2 would be able to couple the
>>>    state of the LSP with the state of signaling adjacency. And when the
>>>    signaling adjacency failure is detected (Hello State timed out -- for a 9
>>>    sec hello interval, the time out takes 31.5 secs), R2 would clean up the
>>>    reservation state and signal a ResvTear to the ingress. This option can be
>>>    used to clean up stale state when long refresh intervals are used.
>>>
>>>
>>> Hope this helps.
>>>
>>>
>>>
>>> Regards,
>>>
>>> -Pavan
>>>
>>> On Tue, Oct 22, 2024 at 4:34 PM Adrian Farrel <adrian@olddog.co.uk>
>>> wrote:
>>>
>>>> Hi Anh,
>>>>
>>>>
>>>>
>>>> [Redirecting from MPLS to TEAS as suggested by Tony Li]
>>>>
>>>>
>>>>
>>>> I think that (given you mention LSPs) you re talking about RSVP-TE (RFC
>>>> 3209) not plain old RFC 2205 RSVP.
>>>>
>>>>
>>>>
>>>> In your example, the link R2-R3 has failed in a way that R3 is aware of
>>>> the failure, but R2 is not aware.
>>>>
>>>>
>>>>
>>>> There are two questions that arise…
>>>>
>>>>    1. Why isn’t R2 able to notice? Presumably the link failure
>>>>    detection is relying on a lower layer (L2 or L1) failure indication, and
>>>>    that is not happening. The answer to this is to run some other link failure
>>>>    detection mechanism such as BFD.
>>>>
>>>> Such a mechanism would allow R2 to declare the link down and possibly
>>>> re-route/repair the LSP via R5, or notify the head end (R1) to let it
>>>> re-route.
>>>>
>>>>    2. How could R3 let R2 know that the LSP has been torn down? The
>>>>    answer is “by sending a PathErr or ResvTear or Notification”. In general,
>>>>    those messages are sent hop by hop, and so they would fail to be routed on
>>>>    the failed link R3-R2, however, it is possible to IP-tunnel to
>>>>    direct-address RSVP packets so that they would be IP-routed to R2 (or
>>>>    direct to R1) via R5.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Adrian
>>>>
>>>>
>>>>
>>>> *From:* Tuấn Anh Vũ <anhvt.hdg@gmail.com>
>>>> *Sent:* 22 October 2024 04:45
>>>> *To:* mpls@ietf.org
>>>> *Subject:* [mpls] Comments abount rfc2205 Resource ReSerVation
>>>> Protocol (RSVP)
>>>>
>>>>
>>>>
>>>> Hi IETF team,
>>>>
>>>> I'm AnhVT from the SVTech company in VietNam, I have experienced some
>>>> RSVP issues in the IPv4 MPLS network.
>>>>
>>>> I suspect that RSVP has a point that needs to be enhanced. I
>>>> describe this point below:
>>>>
>>>>
>>>>
>>>> I./ Topology:
>>>>
>>>>     ---------LSP-------->
>>>>
>>>>     R1----R2----R3-----R4
>>>>
>>>>           |    /
>>>>
>>>>           |  /
>>>>
>>>>            R5
>>>>
>>>> II./ Issue
>>>>
>>>> 1./ Because of some bugs (exp: R3 experiences a flap link between
>>>> R3-R2, but R2 does not recognize the interface flap), R3 indicates that LSP
>>>> is down, then it deletes the LSP state and sends the PathTear downstream to
>>>> R4.
>>>>
>>>> 2./Because R2 does not recognize the interface flap, R2 still keeps
>>>> it available. It does not know that the LSP should be deleted.
>>>>
>>>> 3./ Due to 1./ and 2./ R1 does not know that the LSP is stuck because
>>>> R3 and R4 deleted the LSP state, and R1 continues forwarding traffic to the
>>>> LSP, This makes the service down.
>>>>
>>>>
>>>>
>>>> III./ My comment
>>>>
>>>> I think that RSVP needs a mechanic so that R3 signals to R2 to ensure
>>>> that R2 knows that R3 deleted the LSP. Based on that signal, R2 will bring
>>>> down the LSP and continue to send Reserve Tear to R1.
>>>>
>>>>
>>>>
>>>> I hope that you take a look at my comment.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> AnhVT
>>>> _______________________________________________
>>>> Teas mailing list -- teas@ietf.org
>>>> To unsubscribe send an email to teas-leave@ietf.org
>>>>
>>>