Re: [mpls] Concerns about ISD

Tony Li <tony.li@tony.li> Sat, 16 April 2022 01:21 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5AFA93A173A for <mpls@ietfa.amsl.com>; Fri, 15 Apr 2022 18:21:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.512
X-Spam-Level:
X-Spam-Status: No, score=-6.512 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l2Md9lQ4jYep for <mpls@ietfa.amsl.com>; Fri, 15 Apr 2022 18:21:39 -0700 (PDT)
Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A54663A1744 for <mpls@ietf.org>; Fri, 15 Apr 2022 18:21:39 -0700 (PDT)
Received: by mail-pj1-x102c.google.com with SMTP id o5-20020a17090ad20500b001ca8a1dc47aso12959279pju.1 for <mpls@ietf.org>; Fri, 15 Apr 2022 18:21:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=6osvrJ4TjhqwSjBxkgOiArEvbbq3jVOxpzg8dJDu96E=; b=HvstKVHIN4L2kejV56FVvYMXsgoYpyzcuYagq5BRSH4RwKa+t1bqepbBcu017tOfk7 S00nTvigK1bHjg6+lhpqsAoSGEu7EJ8PLU39TYni3OKyTUnDw4hADNdtEfXyU8dFw+Cn dcoSDkrW0UXGOKlwq+h5jwaxEuOwVYupSUq55ArRhlmn+EiuSLopVdik09TGtNe9b640 JEKuG8DM67wFYpIl//nD8O89AOQEwxEgBsmfv7AxbHr/cOgPW+Pg0mKA6LCrRwqZzDR8 Kf56lsuFHZmDQzHeDXFxBtcGbPWcCdWPmI7mOa+6hhdmjKXKAkuSMKF3GsWT+/6kyJUl 5rHA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=6osvrJ4TjhqwSjBxkgOiArEvbbq3jVOxpzg8dJDu96E=; b=KJp/gINQ4t3kggsiactwu0evV4OmQNUj0IK1Hja5EhpZst25h+8UQLFyhlgI5He2cl 4/SqJT9GLO7MwAfh+Piz0PfUMdWQfXY9bP7cHauStbY62n+sZU3uRr6DJUkNlivJA3Qc QF+L4SRcsOZExIlPyq702KrBBantIic3gjhhdCsVceDQTJthBfRk2g0p93tTC6xSLB6w ZOQh0Zv9jDw3q+tO5v/GSd5cJqjpPTVqG+q9UiLHlsM/R17+ofdiYJgHjiZ1q0YCJj3I 1oVVDxauRzwxhIIGNl0ZaefrhgECPp82Fj1QoYlVtJ1EuzLBMhYY6AMGMk0L2Pqk+D6O j5uQ==
X-Gm-Message-State: AOAM530ZNnGW786IkiWKK1mq1+o/CyeO99GpH9nbCgjAIeYihHlCagIb RfS3AAgmMlnMxQlugC8Tfo71ok+LvQg=
X-Google-Smtp-Source: ABdhPJz4P4ud1dzUxeWxiwZKeZnxyONY08nVCIPqu1H8AlByX5A68ITkigJ9iARiueESSkQeh7ULLQ==
X-Received: by 2002:a17:90a:c504:b0:1bf:6d9f:65a6 with SMTP id k4-20020a17090ac50400b001bf6d9f65a6mr1609231pjt.204.1650072098301; Fri, 15 Apr 2022 18:21:38 -0700 (PDT)
Received: from smtpclient.apple (c-67-169-103-239.hsd1.ca.comcast.net. [67.169.103.239]) by smtp.gmail.com with ESMTPSA id b190-20020a62cfc7000000b0050a457ddd6bsm2837747pfg.149.2022.04.15.18.21.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Apr 2022 18:21:37 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
From: Tony Li <tony.li@tony.li>
Message-Id: <EB04096F-70B7-4FF0-973F-6C7C1FDDE837@tony.li>
Content-Type: multipart/alternative; boundary="Apple-Mail=_4B00B572-E5D7-425E-A4BF-D72EB7C9AC4D"
Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\))
Date: Fri, 15 Apr 2022 18:21:36 -0700
In-Reply-To: <BY3PR13MB4787468DAA96610B9933E1659AF19@BY3PR13MB4787.namprd13.prod.outlook.com>
Cc: John E Drake <jdrake=40juniper.net@dmarc.ietf.org>, Tianran Zhou <zhoutianran=40huawei.com@dmarc.ietf.org>, "mpls@ietf.org" <mpls@ietf.org>
To: Haoyu Song <haoyu.song@futurewei.com>
References: <6cc272447d2f4c779e85d5c42d3b3c6c@huawei.com> <8623637D-A32E-47A4-B5FC-4D2CF40BEDD1@tony.li> <6199e0e886f9437c95ef9b70719b00ec@huawei.com> <BCFD3F4A-36D6-47C2-B907-FC40B402F97C@tony.li> <3fb1f261ddff48deb0c2ea083cdbd16f@huawei.com> <6B96F21B-9331-4FA8-AD7B-84A4CA8B6FAB@tony.li> <903c57a48280454091495673ec2fe275@huawei.com> <BD5C1BE7-4633-4B51-BAC1-B2AE1C537F36@tony.li> <ad6b8c42b0aa4880b9dee02516f5e46f@huawei.com> <F5BB2CEB-CC8C-4E71-A2E7-B4212878C3B1@tony.li> <aa9c4b913d844410b2af90c8db78c194@huawei.com> <BY3PR05MB8081937B52E657713E8293BFC7ED9@BY3PR05MB8081.namprd05.prod.outlook.com> <a29c96be774845e582a66700d2264f7b@huawei.com> <BY3PR05MB8081870EF67C551727BBE2CFC7EC9@BY3PR05MB8081.namprd05.prod.outlook.com> <d5521b3972dd43e38276afbbdc7c2bda@huawei.com> <BY3PR05MB80813C7CAD7F2C12C36FB513C7EE9@BY3PR05MB8081.namprd05.prod.outlook.com> <BY3PR13MB47879EB8A582437DE936688C9AEE9@BY3PR13MB4787.namprd13.prod.outlook.com> <C493D0B8-4B57-4D19-BC27-70ABD7F50356@tony.li> <BY3PR13MB47878B227A37AAA06625194B9AEE9@BY3PR13MB4787.namprd13.prod.outlook.com> <0318B3A3-2884-4FD6-B5EF-377481D2657B@tony.li> <BY3PR13MB4787752FB6D147281A7150789AEE9@BY3PR13MB4787.namprd13.prod.outlook.com> <602D6128-3BE3-4A2D-B5C2-019AE0FADF09@tony.li> <BY3PR13MB47876188B5927A51BD4F4E739AEE9@BY3PR13MB4787.namprd13.prod.outlook.com> <BCB99042-ECA3-40C6-8581-FA1656DDF987@tony.li> <BY3PR13MB4787468DAA96610B9933E1659AF19@BY3PR13MB4787.namprd13.prod.outlook.com>
X-Mailer: Apple Mail (2.3693.60.0.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/D50WNiZTQRnWpZJYdy3xB6lhA34>
Subject: Re: [mpls] Concerns about ISD
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Apr 2022 01:21:45 -0000

Hi Haoyu,

> You’re assuming a TCAM and your only metric here seems to be the storage cost.  Many folks don’t do parsing using a TCAM and the performance cost is in the reads, not in the storage.
>  
> [HS2] TCAM is actually the better choice to store states involving bitmaps. Otherwise, parsing bitmap requires extra calculations in addition to storage, making it even less favorable. 


Ok, I disagree. I would MUCH rather have normal ALU functionality.


> [HS2] We propose to use explicit FEC to tell there’s no EHs in the packet to avoid the label stack scanning in such cases. This doesn’t nullify the need to carry extra metadata for some user packets to support some in-network functions.  


I will wait to see a concrete complete proposal before commenting.


> 
> For the EH encoding efficiency, I appreciate your input. Currently it follows the IPv6 EH type (i.e., using NH + LENGTH to delineate every EH). There could certainly be room for improvement.
>  
>  
> Ok.  If I understand your proposal correctly, there are 4 octets of overhead (HEH) at the start. Then, for each network action, there would seem to be at least 3 octets of overhead, plus any associated data, plus alignment.
>  
> Let’s suppose that we want to encode NFFRR, entropy, and GISS in one packet.  By my math, the cost is:
>  
> EHI: 4 octets
> HEH: 4 octets
> NFFRR: 4 octets
> Entropy: 8 octets
> GISS: 8 octets
>  
> Total: 28 octets
>  
> Do I have that right?
>  
> [HS] It’s right if you think 8 octets are needed for Entropy and GISS
>  
>  
> If I understand your proposal, and we want >8 bits for both of these fields, then each would require 3 octets of overhead. Storing the EL/GIS value might take 2 or 3 octets.  Alignment would take you to 8.
>  
> If I look at the same thing with the FAI draft, I get:
>  
> FAI: 4 octets
> NFFRR: included in the above, so 0 octets
> Entropy: 4 octets (30 bits, plus overhead)
> GISS: 4 octets (30 bits, plus overhead)
>  
> Total: 12 octets
>  
> [HS2] Header overhead is an important dimension for comparison.  To make it apple to apple, let’s first clarify several points.
> If an action doesn’t need any extra data, there’s no point to allocate an EH for it. A flag in EHI is sufficient (this applies to the NFFRR case).


Ok, that saves you 4 octets, but costs you a bit in EHI. There’s only a finite number of bits available.  What do you do when they’re consumed?


> You are using the bitmap encoding since you mention FAI. Bitmap is more compact than Type/Length for sure. Here the implication is that each ISD must be 4 octets long, otherwise it will need every node to understand the size of every data item, further complicating the design.  


In FAI, the length of the ISD is defined based on the action and may have different lengths depending on bit combinations. Please see the EG bits, for example.

Yes, determining the length requires reading the full ISD.


> For extensibility, how many ISD data items is planned to be supported? If it exceeds the one label capacity, you will need to extend it. So the actual overhead of FAI could be 8 octets instead of 4. 


Absolutely true. Allocate popular functions first. :) If there are too many actions, then we could also consider a second SPL.


> Let’s temporarily put all these details aside and assume the minimum sized FAI. Basically, EH has a fixed 4 byte overhead for HEH. For each use case that does need metadata, EH will add 4 more byte overhead. This is the header overhead comparison. (In your example, it’s 24 bytes PSD vs. 12 bytes ISD)



So EH is twice as expensive as FAI.


> Now let’s look at the other dimensions.
> We know some use cases requiring data too big to fit in stack, so PSD is needed anyway. PSD can be used for the use cases with smaller metadata size, although it’s relatively inefficient as discussed above. So essentially we can have just one mechanism all everything but to support ISD we end up with two.


That’s true.  We’re trading added complexity for better performance.


>   
> Bitmap, albeit succinct, is inflexible and less extensible. The semantics and order are fixed at the design time; the total size of each data item must be equal to a label size to make bitmap useful in finding the corresponding data item.


That’s incorrect. The size can vary. An implementation must support the NAI that are set.  Handling cases where a node only knows some of them still needs to be discussed, but I favor just not supporting that case at all.


> Use cases identified in the future can only use unused bitmap bits, regardless of its importance. If a use case requires to extend its data size, it’s out of luck; if some common use cases also apply to the other types of networks (especially for those still under investigation), the use cases can be seriously limited by such size constraint which hampers design sharing and interoperability.


Yes, future actions can only allocate unused bitmap bits.  Reusing existing bitmap bits would overlap with existing functions.  Your statement seems to be a non-sequitur.

A new action may define whatever length of ISD it likes, including being variable length. It’s not recommended, but it is possible.


>     
> Parsing based on bitmap significantly bloats either the parser size or the parsing latency or both (as shown in my analysis). People can write pseudocode by themselves to verify. This is bad for both software and hardware data plane implementation. No wonder bitmap is never used for header encoding before (it’s sometimes used in a header for its sub-structure, which is related to the header processing but not the header parsing).


Only if you insist on using a TCAM for parsing. From my perspective, that’s the exactly wrong tool for the job. If you use conventional ALU, then parsing become a straightforward ‘if’ chain.

We are not obligated to optimize the solution for the wrong tools.

> The encoding of each ISD item itself is awkward. You have 30 bits and only 30 bits at your disposal with a BoS bit in between. It creates a hole in data, which is bad for both software and hardware.


We already discussed this. Yes, it’s not optimal, but it’s only a few instructions to fix, if necessary.

> (BTW, I wonder why we want to redefine entropy. We have already have a standard for it and we don’t need to do anything more about it ).
>  
>  
> It’s true that we have a standard for it. It’s fairly common, so then the interesting question is what happens when it is present in the label stack along with MNA?  If we do not incorporate EL into MNA, then an implementation has to deal with both MNA and EL independently.  What order are they in? How do they interact? If we use the standard ELI/EL encoding, that’s 8 octets just for entropy.  OTOH, if we incorporate EL into MNA, then when using both, we can collapse the encoding.  For example, in the example above, entropy is 4 octets, resulting in a savings of 4 octets.  Half price! And that’s assuming that you don’t use the smaller entropy/GISS encodings which would give an even greater savings.
>  
> [HS2] I think the new design won’t nullify the EL/ELI standard so they need to coexist in an incremental deployment scenario given EL/ELI might have been realized (It’s possible to provide yet another entropy solution using the new mechanism but the necessity is subject to further discussion IMO). But they don’t need to interact with each other because each has its clear function and operational procedure.
> 


Well, you could do things that way, but IMHO, that’s suboptimal.  If you’re including MNA, then you’re already limting yourself to MNA capable nodes. You might as well take the benefit of the better encoding.

Tony