[Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt

Robert Raszuk <robert@raszuk.net> Thu, 04 December 2025 22:17 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@mail2.ietf.org
Delivered-To: idr@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 9699395AE13C for <idr@mail2.ietf.org>; Thu, 4 Dec 2025 14:17:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -1.499
X-Spam-Level:
X-Spam-Status: No, score=-1.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, GB_ABOUTYOU=0.5, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kru0qVNB6RJo for <idr@mail2.ietf.org>; Thu, 4 Dec 2025 14:17:17 -0800 (PST)
Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 8142195AE0CF for <idr@ietf.org>; Thu, 4 Dec 2025 14:17:12 -0800 (PST)
Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-640b06fa959so2508264a12.3 for <idr@ietf.org>; Thu, 04 Dec 2025 14:17:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; t=1764886631; x=1765491431; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JcSeBggCPoDLyMc/Q7ZC3+XUg5Yy/8phZmdtZBz8u9A=; b=I6nHbEnSB/rWkcv8db8Ndqgk9hJdgYaunwA2sIRMz7lGrXoxD8cWj50i0/ByBowkoP mGALLwl7Jy8x7MgHIc9h9csZ+IsftMKAQjzmjUlw45pTDgeFD7m5guTF0qTc7lfUQn7e sM924SNuDH4lvGcLu72gOKCBwGkvRLwFs5VeD4VWNsTiFkVUHCnh8PoWsFHQyZdpFGv5 2r8dz65ljtCJ+TqkioUCzysft58Ihl9PU90bIW7wuGldhA4uO3W3QtGBLQbwjju01YpR v8liQQH6DMU2Sq0YgLga4/z0DrlbQ6cJNPZAA7FnUBp68s0+8XrlXoQXfZvJxSitLFqT SqBQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764886631; x=1765491431; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JcSeBggCPoDLyMc/Q7ZC3+XUg5Yy/8phZmdtZBz8u9A=; b=ZU0+SUv/Wt6KxSl/jYeGoLgfNHjlhvw/AQEYxhI+J2o1v+zWbL16AwoEJu3wM3aIIE DLAoEl/j3vtko0GjasFyxnbNQfuzkgt+4SqzTmEqtQ+Zj0xy7GS+eFI9zPgD1+ThjnJA CnNWErm0bo+O2zR0jk5Lmv6wgDpp8WPlCYXCvsH+qDoZX6wIFeCS/pPmMs2Vb+oUroyw W4dYouxX0k5eqm8lmEzcAZOMVnpF3vL/dZOZeU6a2z+GECvNf/r18ETeMFmI3XTNtJ47 gdnnqn3Q+OfGAwCV8kRP3EDwCqbVYWQH5aTfe3MMQMSS8TLVfBYmUymxbe8O9Ua0D7Jo Pkrg==
X-Forwarded-Encrypted: i=1; AJvYcCUEEpE0yTDQEfRluXGPPERvXHkl2M97IKt4JrXFanfNr/BsAHZEiA7dT+DIx9HRzvghUjo=@ietf.org
X-Gm-Message-State: AOJu0YwRANiDPpIY/w3hDnRd5gXPdnZELI/PdBHNbWN3U/WvBr2tOfwm dYBAAjo5ujBQsNu9lUwDyMcjtk2nHrMOa6yQpOGvZ1vn8iOno7hyIqXgHdpbj9jlD1o6krcYc8G OuacN4wJoBbkJGYkpXxEdxJzoyW/Drze1XXqqUJ1oJN3XlxOJoLUO
X-Gm-Gg: ASbGncviYVUFASzfz4aJaapZ+r3nsm1D+HVQhnU82KBp6gubrAHZJl8fjdi1+SzW7Be wdXPxMI28EWVT+H/HOmHfDAFvnETw6yQNnMfJ2lpPzKoj0TxTHvZNOrl7fayQKdWfE7XCzjK5w6 +td9B6RdkVxxpzuiuAKZOXudQgB5iMgRSyABgpAGyIacJ/WxDqMkyg6OC8HoA5iTPnbvDynMKiG 7F2AomFePtdet+l0l+Kj37k+HhaiaUGTrBd3uyGl5RqUWB4lz47iM64lSQAnfR5rKodtjsrdKAQ IJgFRg==
X-Google-Smtp-Source: AGHT+IGI37Ulx93w2s4LLZwsALw5lXJ+hzl6IKm5y8N+vrmLWTEuXTxL/ajm+k/OnVXackmEOBi438EIoviSj0dRDmY=
X-Received: by 2002:a05:6402:2105:b0:645:f758:4e1b with SMTP id 4fb4d7f45d1cf-6479c40dc1dmr7131742a12.14.1764886631168; Thu, 04 Dec 2025 14:17:11 -0800 (PST)
MIME-Version: 1.0
References: <176462578612.3650528.8915305565733099516@dt-datatracker-5bd94c585b-wk4l4> <CAOj+MMEw4HFJRmJ_=VhVSQCr1Sic6nrixXqFYpT3E47Mk_EaGw@mail.gmail.com> <CABNhwV2XzaTyiETsYr-STKypq9M49YnW8ekj5jTsG5==xmRX5Q@mail.gmail.com> <MW4PR84MB2092F37923BBE432BF32AECF86D8A@MW4PR84MB2092.NAMPRD84.PROD.OUTLOOK.COM> <CAOj+MMHpDy6zfSjC4nuaVwst+Bj+vcDNjz6CXkO5x6N7-Rkx6w@mail.gmail.com> <MW4PR84MB2092FDB05447EB3962527A5786D9A@MW4PR84MB2092.NAMPRD84.PROD.OUTLOOK.COM> <CAOj+MMEaAJw3Ss8osrRCVheF-k7NL9eA+5KWXeBmk5fpLaujBg@mail.gmail.com> <MW4PR84MB20928FB359828A8DA27A200286A6A@MW4PR84MB2092.NAMPRD84.PROD.OUTLOOK.COM>
In-Reply-To: <MW4PR84MB20928FB359828A8DA27A200286A6A@MW4PR84MB2092.NAMPRD84.PROD.OUTLOOK.COM>
From: Robert Raszuk <robert@raszuk.net>
Date: Thu, 04 Dec 2025 23:17:00 +0100
X-Gm-Features: AWmQ_bmpqGDp5Hnlbr1WzfqR246fXu1vLnTVpO3f7JWG_929on7TQmv7No_t6pc
Message-ID: <CAOj+MMH=F-p4LPTas0UeB_fuE6TAhKvcjmUYgFEhv_tpqdOeBg@mail.gmail.com>
To: "Wang, Kevin" <kevin.wang@hpe.com>
Content-Type: multipart/alternative; boundary="000000000000b84930064527b0f6"
Message-ID-Hash: YPUQXYDRWHG2CPTSZWU4RQCLWILCEZB7
X-Message-ID-Hash: YPUQXYDRWHG2CPTSZWU4RQCLWILCEZB7
X-MailFrom: robert@raszuk.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-idr.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "idr@ietf. org" <idr@ietf.org>, lsr <lsr@ietf.org>
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
List-Id: Inter-Domain Routing <idr.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/diVwKzL2U6dHAAfcuUyRozTqeJI>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Owner: <mailto:idr-owner@ietf.org>
List-Post: <mailto:idr@ietf.org>
List-Subscribe: <mailto:idr-join@ietf.org>
List-Unsubscribe: <mailto:idr-leave@ietf.org>

Hi Kevin,

I am sceptical if the proposed BGP extension is desired in BGP protocol.
But this is just my own opinion and as Jeff says I can be in "rough" on it.

But reading on your proposal I do think that marking this coloring on a per
BGP session basis (strict or loose) is a very bad idea. We have departed
from any per session marking when MP BGP Extensions have been introduced.
So if you want to continue I recommend a much more granular capability of
coloring. Ideally on a per NLRI/UPDATE MSG basis.

With your current proposal you have created a physical partitioning not
logical one.

Also can you elaborate in your draft (keeping in mind BGP native
recursiveness) why BGP CAR or BGP CT proposals fail to address your
objectives ? Are they broken and need fixing or you just prefer to start
fresh with yet one more way to achieve the same ?

Thank you,
R.



On Thu, Dec 4, 2025 at 10:18 PM Wang, Kevin <kevin.wang@hpe.com> wrote:

> Hi Robert,
>
> Unless we have perfect load balancing, congestion is always possible, even
> in a non-blocking Clos fabric. Also, there are other scenarios where
> avoiding fate-sharing paths is crucial.
>
> Thanks,
> Kevin
>
> *From: *Robert Raszuk <robert@raszuk.net>
> *Date: *Wednesday, December 3, 2025 at 3:59 PM
> *To: *Wang, Kevin <kevin.wang@hpe.com>
> *Cc: *Gyan Mishra <hayabusagsm@gmail.com>, idr@ietf. org <idr@ietf.org>,
> lsr <lsr@ietf.org>
> *Subject: *Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> Hi Kevin,
>
> Your draft explains how to do poor man's flex algo in BGP - ok.
>
> But could you elaborate why anyone would do that (and push more
> complexity) in a non-blocking CLOS fabric ?
>
> Cheers,
> R.
>
>
>
> On Wed, Dec 3, 2025 at 7:35 PM Wang, Kevin <kevin.wang@hpe.com> wrote:
>
> Hi Robert,
>
> Thank you for providing further details about your thoughts. What I heard
> that IGP was not initially adopted in DC fabrics was due to its scaling
> issues (mostly due to lsdb flooding), especially for the hyperscalers. I
> understand that there were efforts later trying to address the scaling
> issues from IGP side. I see your experience of using ISIS to successfully
> construct the fabric as a good example. Yes, it might be worth to write an
> ISIS for DC fabrics informational RFC, serving as an alternative to RFC
> 7938. There are also other efforts trying to bring traffic engineering
> technologies, such as RSVP, MPTE, etc to the DC fabrics. Like any other
> networks, the DC fabrics will probably also evolve over time.
>
> Having said that, most of today’s DC fabrics (at least for those DC
> customers I have dealt with) are designed following RFC 7938:
>
>    - Use Clos topology
>    - Use IP forwarding
>    - Use EBGP as the underlay routing protocol
>
> I guess the choices above are for technical reasons as well as business
> reasons. BGP DPF is developed under the assumptions/observations above. I
> agree that the DC fabrics might evolve and adopt other technologies such as
> IGP, RSVP, in the future. For the time being and the foreseeable future,
> BGP DPF would help to provide a lightweight traffic engineering for the DC
> fabrics.
>
> Thanks,
> Kevin
>
> *From: *Robert Raszuk <robert@raszuk.net>
> *Date: *Tuesday, December 2, 2025 at 2:46 PM
> *To: *Wang, Kevin <kevin.wang@hpe.com>
> *Cc: *Gyan Mishra <hayabusagsm@gmail.com>, idr@ietf. org <idr@ietf.org>,
> lsr <lsr@ietf.org>
> *Subject: *Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> Dear Kevin,
>
> I know very well what RFC 7938 says. In fact I did review this document
> well before it became an RFC :)
>
> But what happened next is that while RFC7938 make a valid observation on
> how one can build MSDCs lots of folks misinterpreted it as the only guide
> on how to build even a few racks of DC fabrics.
>
> So yes, using BGP to construct dynamic routing in the DC fabrics has its
> use cases that are really applicable to only a handful of deployments. And
> I am not aware that any of the MSDCs would be asking you for logical
> transport planes within their fabrics.
>
> All other DCs would be much better off using IGP for underlay and BGP for
> overlay as a design pattern.
>
> When I constructed 10 full racks of hardware using ISIS folks were shocked
> - and pointed out that I am not using an IETF standard approach :). Then
> when I demonstrated that connectivity restoration upon any node or link
> failure is repaired in less then 50 ms the masks went off.
>
> Maybe what is actually needed is an  informational RFC - just like RFC7938
> - simply illustrating that one can construct DC using ISIS. It is obvious
> to me, but I admit there is no RFC I am aware of to show operators that
> "Large-Scale Data Centers" can be robustly build with IGPs.
>
> Kind regards,
> Robert
>
>
> On Tue, Dec 2, 2025 at 7:24 PM Wang, Kevin <kevin.wang@hpe.com> wrote:
>
> Hi Robert and Gyan,
>
> Thanks for your feedback! Your observation is correct that IGP Flex Algo
> could achieve the same. BGP DPF can be though as a BGP counterpart of IGP
> Flex Algo to some extent (though not precisely).
>
> As explained in the “Introduction” section of this draft, BGP DPF is
> designed for the current IP fabric environment where EBGP is usually the
> only protocol used for routing. Section 5 of RFC 7938 explains why DC
> fabrics use EBGP as the sole routing protocol.
>
> Thanks,
> Kevin
>
> *From: *Gyan Mishra <hayabusagsm@gmail.com>
> *Date: *Tuesday, December 2, 2025 at 7:43 AM
> *To: *Robert Raszuk <robert@raszuk.net>
> *Cc: *idr@ietf. org <idr@ietf.org>, lsr <lsr@ietf.org>
> *Subject: *[Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> I agree with Robert that you could use RFC 9502 IGP Flex Algo in IP
> networks to build disjoint planes as desired.
>
> You could also use SRv6 with IGP Flex Algo with SR RFC 9350 which uses
> IPv6 data plane and build your disjoint planes.
>
> Thanks
>
> Gyan
>
> On Tue, Dec 2, 2025 at 6:32 AM Robert Raszuk <robert@raszuk.net> wrote:
>
> Hi,
>
> In respect to the subject draft ... why would you not use IGP Flexible
> Algorithm for it ?
>
> Are you going to port now years of work from IGP to BGP to achieve the
> same ?
>
> Besides, in a non-blocking fabric latency is really not a factor. So you
> want to logically partition it to make it blocking them worry about what
> travels on which such logical plane ? Is this a reasonable direction ?
>
> Thx,
> R.
>
> ---------- Forwarded message ---------
> From: <internet-drafts@ietf.org>
> Date: Mon, Dec 1, 2025 at 10:49 PM
> Subject: I-D Action: draft-wang-idr-dpf-00.txt
> To: <i-d-announce@ietf.org>
>
>
> Internet-Draft draft-wang-idr-dpf-00.txt is now available.
>
>    Title:   BGP Deterministic Path Forwarding (DPF)
>    Authors: Kevin Wang
>             Michal Styszynski
>             Wen Lin
>             Mahesh Subramaniam
>             Thomas Kampa
>             Diptanshu Singh
>    Name:    draft-wang-idr-dpf-00.txt
>    Pages:   18
>    Dates:   2025-12-01
>
> Abstract:
>
>    Modern data center (DC) fabrics typically employ Clos topologies with
>    External BGP (EBGP) for plain IPv4/IPv6 routing.  While hop-by-hop
>    EBGP routing is simple and scalable, it provides only a single best-
>    effort forwarding service for all types of traffic.  This single
>    best-effort service might be insufficient for increasingly diverse
>    traffic requirements in modern DC environments.  For example, loss
>    and latency sensitive AI/ML flows may demand stronger Service Level
>    Agreements (SLA) than general purpose traffic.  Duplication schemes
>    which are standardized through protocols such as Parallel Redundancy
>    Protocol (PRP) require disjoint forwarding paths to avoid single
>    points of failure.  Congestion avoidance may require more
>    deterministic forwarding behavior.
>
>    This document introduces BGP Deterministic Path Forwarding (DPF), a
>    mechanism that partitions the physical fabric into multiple logical
>    fabrics.  Flows can be mapped to different logical fabrics based on
>    their specific requirements, enabling deterministic forwarding
>    behavior within the data center.
>
> The IETF datatracker status page for this Internet-Draft is:
> https://datatracker.ietf.org/doc/draft-wang-idr-dpf/
> <https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-wang-idr-dpf/__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJlshPeG9$>
>
> There is also an HTML version available at:
> https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html
> <https://urldefense.com/v3/__https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJjgsy_TY$>
>
> Internet-Drafts are also available by rsync at:
> rsync.ietf.org::internet-drafts
>
>
> _______________________________________________
> I-D-Announce mailing list -- i-d-announce@ietf.org
> To unsubscribe send an email to i-d-announce-leave@ietf.org
> _______________________________________________
> Idr mailing list -- idr@ietf.org
> To unsubscribe send an email to idr-leave@ietf.org
>
>