[Sidrops] Re: draft-ietf-sidrops-8210bis-23 is ambiguous session mismatch handling

Ralph Covelli <rcovelli@he.net> Fri, 26 December 2025 05:13 UTC

Return-Path: <rcovelli@he.net>
X-Original-To: sidrops@mail2.ietf.org
Delivered-To: sidrops@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 0542F9F73028 for <sidrops@mail2.ietf.org>; Thu, 25 Dec 2025 21:13:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: 0.602
X-Spam-Level:
X-Spam-Status: No, score=0.602 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (1024-bit key) header.d=he.net
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TqHMh4OmVe5D for <sidrops@mail2.ietf.org>; Thu, 25 Dec 2025 21:13:48 -0800 (PST)
Received: from mailhost.lightning.net (mailhost.lightning.net [209.51.160.9]) by mail2.ietf.org (Postfix) with SMTP id DBA079F73021 for <sidrops@ietf.org>; Thu, 25 Dec 2025 21:13:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=he.net; s=lightning; x=1767330817; i=rcovelli@he.net; h=Received: Received:Content-Type:Message-ID:Date:MIME-Version:User-Agent: Subject:To:References:Content-Language:From:In-Reply-To; bh=Q2+J vjqBLl2LShs6EVjnrV7N60FdL6bvx0KG94agzfA=; b=znPQ3OVUWWcXa8EQvl+y AbUstYe8p1VhGaM9Sv3fTsuJuMEWLKFWuJ+XrKVqTkJ0Rmghl2ix6PPZBy0deWoB b80HenCIYLmeO7sMfj/HkNHVlMn98n77/RrtG9pPoff79F432hWoq3yITm2RYOuR diF4bWDGHDDrZjxo9PJG6NE=
Received: (qmail 21953 invoked from network); 26 Dec 2025 05:13:37 -0000
Received: from traffic.lightning.net (HELO ?172.16.2.4?) (ralph@lightning.net@209.51.160.8) by mailhost.lightning.net with ESMTPA; 26 Dec 2025 05:13:37 -0000
Content-Type: multipart/alternative; boundary="------------FH4Tyr0viXO9Sp8XdUo0fZ4F"
Message-ID: <c197cb95-dfbd-4809-8085-5cc0ca918b79@he.net>
Date: Fri, 26 Dec 2025 00:13:38 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: sidrops@ietf.org
References: <228ff33f-ddb0-46c5-aadf-7b742554165e@he.net> <4df00da3-0ffd-4b58-8671-9aa28ac14fb7@he.net> <ff478e6b-ba0d-47a2-92b7-7b94f7124756@he.net> <aUjTEw4hDo1Xji2t@TomH-498551.lan> <7fc574e2-5781-404b-b0d4-d2fabb9666b2@he.net> <2aa62837-b672-4b1b-8755-c9a7cfd6d7a7@he.net> <30747a48-1408-492b-bbc0-77f7526a3cb0@he.net> <aUnYWRZyaefsrsOq@TomH-498551.lan> <88f1cf70-94e7-41b1-b4bb-1e88fd88f319@he.net> <d063ed77-588d-409c-ac52-71a1a27aa5f6@he.net> <aUoxJe4HNCxe6jaY@TomH-498551.lan>
Content-Language: en-US
From: Ralph Covelli <rcovelli@he.net>
In-Reply-To: <aUoxJe4HNCxe6jaY@TomH-498551.lan>
Message-ID-Hash: XYIPLKKYVXJZ7KBYKQ4P2NCRPRSRMFLG
X-Message-ID-Hash: XYIPLKKYVXJZ7KBYKQ4P2NCRPRSRMFLG
X-MailFrom: rcovelli@he.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-sidrops.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Sidrops] Re: draft-ietf-sidrops-8210bis-23 is ambiguous session mismatch handling
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/VA4E0JCTsVPuaOnPdx1iO1_W4t8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Owner: <mailto:sidrops-owner@ietf.org>
List-Post: <mailto:sidrops@ietf.org>
List-Subscribe: <mailto:sidrops-join@ietf.org>
List-Unsubscribe: <mailto:sidrops-leave@ietf.org>

Hi Tom!

I read your changes.  Thank you for taking the time to address this 
ambiguity!  It is much appreciated!!

In my humble opinion, I think we need to tread very lightly here.  In 
this case I think "less is more".  One sentence added at most.  No 
change to existing logic.

The language in question is also in RFC8210 and probably should NOT be 
changed because it *IS* technically correct, it is just incomplete.

RFC8210
5.1.  Fields of a PDU

       ... If, at any time ***after the
       protocol version has been negotiated*** (Section 7), either the
       router or the cache finds that the value of the Session ID is not
       the same as the other's, the party which detects the mismatch MUST
       immediately terminate the session with an Error Report PDU with
       code 0 ("Corrupt Data")...

In my reading of this the authors (correctly) carve out an exception for 
the Initial Serial Query (first PDU after connect).  In this case the 
protocol negotiation is happening via Serial Query PDU.

If the intention was to treat ALL Serial Queries with mismatched Session 
IDs as fatal errors then ***why mention the protocol negotiation at 
all***?  There is no protocol negotiation *before* the Serial Query PDU 
because the information is *inside* that Serial Query PDU.  Your 
interpretation would make sense if there was a separate PDU for version 
negotiation but there isn't.  Why would the authors explicitly make an 
exception for an impossible PDU?  :-)

The problem here is the authors don't say what you should do when it 
*IS* an Initial Serial Query.  They only tell you what *NOT* to do... 
*Don't* send a fatal error.  RTR cache servers that do this are not 
technically conforming to RFC8210.  It's relatively harmless besides the 
kill but I don't think we should remove or change this.  It's correct as 
written.  Changing this now would create contradictions with RFC8210 
which we don't need.

So if we shouldn't send a fatal error what should we do?  We should send 
a Cache Reset because we literally just reset the cache.  :-)  Cache 
Reset seems right at home here!  This is the exact same condition as if 
a Router's sequence number had fallen behind out of the cache window.  
The RTR Cache should signal the router to clear its cache and issue a 
Reset Query.  Cache Reset's job.

It seems my worries about Routers not learning new Session ID's from 
Cache Resets/Reset Queries were incorrect.  This case is already 
handled.  (woops my fault!)

RFC8210
5.5.  Cache Response

    In response to a Reset Query, the ***new*** value of the Session ID tells
    the router the instance of the cache session for future confirmation.

 From this wording it seems that routers *should always* be willing to 
update Session ID's after a Reset Query so this shouldn't be a problem 
for Cache Reset.

I understand you are worried about compatibility with implementations in 
the wild.  I am not advocating for the addition of language that would 
break any compatibility.

If an RTR cache issues a Fatal Error instead of a Cache Reset the Router 
will synchronize via reconnect and Reset Query.

If an RTR cache issues a Cache Reset and for some reason the router 
continues to use the old Session ID, the RTR cache will send a Fatal 
Error and the Router will synchronize via reconnect and Reset Query.

All roads lead to convergence... but a Cache Reset gets you there 
fastest without any session disconnects!  Why encourage people to take 
the (destructive) scenic route?  :-)  A protocol should be able to 
gracefully handle its own servers restarting.  Current RFC8210 
documentation leaves it up to the implementer so you can't depend on any 
behavior there.  RFC6810 doesn't seem to be any help either.

I think the document should describe the *proper* behavior here and not 
all the *possible* behavior.  We want to be only clarifying existing 
undefined behavior, not redefining any existing behavior.

I think my original changes plug up all the logic holes and only adds 4 
words. :-)

RFC8210bis-23
5.3. Serial Query:
    The Session ID tells the cache what instance the router expects to
    ensure that the Serial Numbers are commensurate, i.e., the cache
    session has not been changed.  If the Session ID does not match
    *during protocol version negotiation*, the cache MUST respond with a
    Cache Reset.

It should be completely safe to add wording to properly cover the 
Initial Serial Query / Cache Reset case where the *actual* cache resets 
without mentioning the misuse of 5.1 wording.  Everyone still has to 
treat fatal errors and cache resets like normal.

I would be interested in hearing from the authors on their section 5.1 
intentions here.

What do you think Tom?  (or anyone else who would like to chime in)

Thanks!

On 12/23/2025 1:05 AM, Tom Harrison wrote:
> Hi Ralph,
>
> On Mon, Dec 22, 2025 at 10:35:53PM -0500, Ralph Covelli wrote:
>> On 12/22/2025 10:28 PM, Ralph Covelli wrote:
>>> The problem is that the "existing behavior" is *not* clearly
>>> defined in this case.  Your testing confirms this. (thank you for
>>> taking the time to look)
>>>
>>> If you treat this condition as an error you will wind up
>>> unnecessarily killing all of your router connections every time you
>>> reset the RTR cache.
> I think these are good points.
>
>>> In the end all roads lead to convergence:
>>>
>>> With Cache Reset:
>>>
>>> 1) Router disconnects from RTR cache
>>> 2) RTR cache server restarts
>>> 3) Router connects to RTR cache and sends Serial Query
>>> 4) RTR cache sends Cache Reset to Router
>>> 5) Router syncs
>>>
>>> With terminal Error:
>>>
>>> 1) Router disconnects from RTR cache
>>> 2) RTR cache server restarts
>>> 3) Router connects to RTR cache and sends Serial Query
>>> 4) RTR cache sends Error to Router and disconnects
>>> 5) Router connects to RTR cache and sends Reset Query
>>> 6) Router syncs
>>>
>>> The Session ID appears to be designed to detect cache changes. This
>>> should work both ways.  Router to RTR cache and RTR cache to
>>> router.
>> *Whether* its treated with a Cache Reset or a Terminal Error, the
>> specific wording should be added to the document.
> Although I think clarifying the document and providing some way to
> avoid unnecessary connection terminations are useful things, I'm
> mindful that there is at least one implementation that operates in
> reliance on the reading I described in my previous mail.  I've had a
> go at updating the text to address these problems at
> https://github.com/APNIC-net/rpki-rtr-demo/commit/dd901d4f98cc3be23b798bc1e04fbd5042001965#diff-ba5205cdbf492fee8c9ee5a7c3afa6ea4f0eb87eb392a0286f9bce7eb783a631.
> Do those changes look OK to you?
>
> -Tom
>
> _______________________________________________
> Sidrops mailing list --sidrops@ietf.org
> To unsubscribe send an email tosidrops-leave@ietf.org