Re: [Sidrops] Roman Danyliw's Discuss on draft-ietf-sidrops-8210bis-08: (with DISCUSS and COMMENT)

Randy Bush <randy@psg.com> Wed, 15 June 2022 00:47 UTC

Return-Path: <randy@psg.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA514C15AAEA; Tue, 14 Jun 2022 17:47:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZhibrsJmqwU8; Tue, 14 Jun 2022 17:47:10 -0700 (PDT)
Received: from ran.psg.com (ran.psg.com [IPv6:2001:418:8006::18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3CF0C15AAE9; Tue, 14 Jun 2022 17:47:10 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=ryuu.rg.net) by ran.psg.com with esmtp (Exim 4.93) (envelope-from <randy@psg.com>) id 1o1HBa-000HPO-A6; Wed, 15 Jun 2022 00:47:06 +0000
Date: Tue, 14 Jun 2022 17:47:05 -0700
Message-ID: <m2wndi4ut2.wl-randy@psg.com>
From: Randy Bush <randy@psg.com>
To: Roman Danyliw via Datatracker <noreply@ietf.org>
Cc: The IESG <iesg@ietf.org>, draft-ietf-sidrops-8210bis@ietf.org, sidrops-chairs@ietf.org, sidrops@ietf.org, morrowc@ops-netman.net
In-Reply-To: <165517201935.45433.15262387673481973752@ietfa.amsl.com>
References: <165517201935.45433.15262387673481973752@ietfa.amsl.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/26.3 Mule/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="ISO-2022-JP"
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/Q375EcO81QSPXEVitLX2heC26Q8>
Subject: Re: [Sidrops] Roman Danyliw's Discuss on draft-ietf-sidrops-8210bis-08: (with DISCUSS and COMMENT)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2022 00:47:14 -0000

roman:

[ i will push a new version so iesg and directorate folk who have not
  yet read have fresher text.  i do anticipate more change; as i doubt
  we have everything correct. ]

thanks for the thorough review.

> ** Section 5.11
> 
>    The diagnostic text is optional; if not present, the Length of Error
>    Text field MUST be zero.  If error text is present, it MUST be a
>    string in UTF-8 encoding (see [RFC3629]).
> 
> How does the protocol convey the language the diagnostic text?

by changing the field to be binary :)

   The Arbitrary Bytes field is optional; if not present, the Length of
   Arbitrary Bytes field MUST be zero.  If Arbitrary Bytes are present,
   they are, as named, arbitrary values.

   +-------------------------------------------+
   |                                           |
   |              Arbitrary Bytes              |
   |                    of                     |
   ~             Error Diagnostic              ~
   |                                           |
   `-------------------------------------------'

i am somewhat embarrassed; but this stopped a slope of complexity.

> I appreciate that this behavior in v2 comes from v1 and v0.  How was
> it handled previously?

same; i.e. not properly handled.

> ** Section 7.  The guidance on error reporting seems inconsistent.
> 
> (a) Section 5.11 says “If error text is present, it MUST be a string in UTF-8
> encoding”
> 
> (b) Section 7 says “in which case the Arbitrary Text field of the ERROR Report
> PDU MUST be a list of one octet binary integers indicating the version numbers
> the cache supports.”
> 
> The text in (b) seems to be describing an encoding inconsistent with the
> guidance in (a).

(a) has been removed and (b) now says

    in which case the Arbitrary Bytes field of the ERROR Report PDU MUST
    be a list of one octet binary integers indicating the version
    numbers the cache supports.

> ** Section 9.  Use of TCP MD5.
> 
>    It is expected that, when the TCP Authentication Option (TCP-AO)
>    [RFC5925] is available on all platforms deployed by operators, it
>    will become the mandatory-to-implement transport.
>          …
>    *  Caches and routers MAY use TCP MD5 transport [RFC5925] using the
>       rpki-rtr port.  Note that TCP MD5 has been obsoleted by TCP-AO
>       [RFC5925].
> 
> The above text comes from RFC6810 (2013), repeated again in RFC8210
> (2017) and stated here (2022).  Why isn’t it appropriate 9 years later
> to drop support for TCP MD5 in v2?

because AO implementation is still far from universal; and all bgp
speakers have md5.

> If there was a compelling operational reason, could the text be
> rewritten to the effect of “TCP MD5 SHOULD NOT be used unless TLS,
> IPSec and SSH are NOT supported.”

   *  Caches and routers MAY use TCP MD5 transport [RFC5925] using the
      rpki-rtr port if no other protected transport is available.  Note
      that TCP MD5 has been obsoleted by TCP-AO [RFC5925].

> ** Section 9.  Per the TLS guidance, is there a reason why conformance
> to RFC7525 or draft-ietf-uta-rfc7525bis-07 cannot be required to
> ensure modern version of TLS and secure ciphersuites.

   *  Caches and routers MAY use Transport Layer Security (TLS)
      transport [RFC8446] using port rpki-rtr-tls (324); see Section 15.
      Conformance to [RFC7525] modern cipher suites is REQUIRED.

> ** Relationship with RFC8210.  I support Martin Duke's DISCUSS.
>
> -- The abstracts says “This document updates and replaces RFC 8210.”  When I
> read “replaces”, I took that to mean that RFC8210 is obsoleted by this document.
> 
> -- Section 1 says “This document updates [RFC8210].”  I found that confusing
> because I don’t see how I need to read this document to implement v1 of the
> protocol.  I was under the impression this was v2, something stand-alone.

i do not have a dog in this fight.  as i responded to robert, when the
tablet on this comes down from the mountain, i will make a pass doing
this in a consistent fashion.

> ** Section 4.  Per “It is configured with a semi-ordered list of
> caches …”, what kind of data structure is “semi-ordered”?  Is this a
> priority list?

what phrasing would you suggest when it is not strictly ordered, i.e.
two members may have the same ordinality?

> ** Section 4.
>    ... servers'
>    clocks MUST be correct to a tolerance of approximately an hour.
> 
> How does an implementer reconcile the strict “MUST” guidance of
> “approximately an hour” and still ensure interoperability?  What is
> the “tolerance” on “approximately” -- 60 minutes and 1 second, 70
> minutes, etc.  I recommend s/approximately an hour/an hour/.

fine

> ** Section 5.1.  Why do some fields have a data-type and others just
> text description.  For example, “Protocol Version” is explicitly
> described as being an 8-bit unsigned integer, but the text doesn’t say
> that a Serial number is 32-bits.

ok, hacked.  idr docs always have explicit counts.  ops doc historically
let you hunt in the packet porn.

> ** Section 5.1.
>       A cache increments its Serial Number when
>       completing a rigorously validated update from a parent cache or
>       the Global RPKI.
> 
> Is there are update which is not “rigorously validated”?  I’m trying to
> understand if there are updates which don’t bump the serial number.

:)

s/rigorously//

> ** Section 5.1.  With a 16-bit Session ID, how concerning or likely is
> a collision?  Rob Wilton asked a similar question.  A few birthday
> paradox approximations says 50% chance of a collision with 256 caches
> or 1% chance of collision with 36 caches assuming they are randomly
> selected.

the context of a session id is per {cache,router} pair/session.  i.e. if
a router has two caches with the same session id, they are drawn from
two separate spaces, namely a per cache context.

in retrospect, bits are cheap and maybe we should have used 32.  but, at
this point, is it worth the per-version code complexity in both routers
and caches?

> ** Section 5.1
>       A seconds-since-epoch timestamp value such as the POSIX time()
>       function makes a good Session ID value.
> 
> What is the recommended approach to convert the POSIX time() result
> into the smaller 16-bit value?

      The Session ID might be a pseudorandom value, a strictly
      increasing value if the cache has reliable storage, et cetera.  A
      seconds-since-epoch timestamp value such as the low order 16 bits
      of unsigned integer seconds since 1970-01-01T00:00:00Z ignoring
      leap seconds might make a good Session ID value.

> ** Section 5.1.  Per the definition of a Provider AS Number, what is a
> “spacified AFI”?

see vint's work on intra-solar system comms :)

> ** Section 5.6, 5.7, and 5.10.  Editorial. In the PDU descriptions
> prior to these, the section text opened with some notional description
> of who sends the PDU and why.  Here, the section opens with a diagram
> and only covers a field level description.

got 'em.  thanks.

> ** Section 5.7.
> 
> Analogous to the IPv4 Prefix PDU, it has 96 more bits and no magic.
> 
> What is mean by “… and no magic?”

a widely known and used ops phrase hinting that the value in ipv6 is the
address length and the rest of the embellishments are pretty much a
pita.

> ** Section 5.10.  What is the Length of the PDU?  The figure suggests
> that the Subject Key Info is a fixed 4 bytes which doesn’t seem right.

hmmmm.  truth is i do not know how we got that.  it is an 8635 ecdsa
key, which leads us to 8209 3.1.2 which leads us to 3.1 of 8208, which
leads us to 5480, which ...

so, in the field description

   Subject Public Key Info:  A variable length field holding a router
      key's subjectPublicKeyInfo value, as described in [RFC8608].  This
      is the full ASN.1 DER encoding of the subjectPublicKeyInfo,
      including the ASN.1 tag and length values of the
      subjectPublicKeyInfo SEQUENCE.

and in the pdu

      As the Subject Public Key Info is a variable length field, it
      must be decoded to determine where the PDU terminates.
	
> ** Section 5.11.  The section doesn’t explicitly define the contents of the
> “Encapsulated PDU” field.

      The Erroneous PDU field is a binary copy of the PDU causing
      the error condition, including all fields.

> ** Section 9.1.  Per the SSH authentication information:
>    User authentication MUST be supported; host
>    authentication MAY be supported.  Implementations MAY support
>    password authentication.
> 
> -- What is “user authentication”?    Per the authentication method names in
> https://www.iana.org/assignments/ssh-parameters/ssh-parameters.xhtml#ssh-parameters-10,
> there isn’t one called that.  Do you mean “publickey”?  Maybe:
> 
> NEW
> User authentication (“publickey”) MUST be supported; host    authentication
> (“hostbased”) MAY be supported.  Implementations MAY support password
> authentication (“password”).
>

thanks for the text!

> -- Since this text appears to be profiling SSH, and the this here comes from
> Section 5 of RFC4252, should it be explicitly said that “none” MUST
> NOT be used.

   Cache servers supporting SSH transport MUST accept RSA authentication
   and SHOULD accept Elliptic Curve Digital Signature Algorithm (ECDSA)
   authentication.  User authentication "publickey") MUST be supported;
   host authentication "hostbased") MAY be supported.  Implementations
   MAY support password authentication "password").  "None"
   authentication MUST NOT be used.  Client routers SHOULD verify the
   public key of the cache to avoid MITM attacks.

> ** Section 9.2.
> 
>       The client router MUST set its "reference identifier" to the DNS
>       name of the rpki-rtr cache.
> 
> What is a “reference identifier”?

6.2 of 6125

> ** Section 9.3.
> 
> If TCP MD5 is used, implementations MUST support key lengths of at
>    least 80 printable ASCII bytes, per Section 4.5 of [RFC5925].
> 
> RFC5925 doesn’t have a Section 4.5, or provide guidance about using
> printable ASCII bytes

sigh.  shot foot blindly following "obsoletes."  it's RFC2385.

> ** Section 10.
> This preference merely denotes proximity, not
>    trust, preferred belief, et cetera.
> 
> “Proximity” to what?  I though this list was rank order list per the
> preference value.

   This preference is intended to be based on proximity, a la RTT, not
   trust, preferred belief, et cetera.

> ** Section 11.
> When a cache is sending ROA PDUs to a router ...
> 
> I missed something.  What are ROA PDUs in this context?  Why are they not
> defined in Section 5.

When a cache is sending ROA (IPv4 or IPv6) PDUs to a router

> ** Section 12.
> 
>    To keep load on Global RPKI services from unnecessary peaks, it is
>    recommended that primary caches which load from the distributed
>    Global RPKI not do so all at the same times, e.g., on the hour.
> 
> I’m trying to tie this recommendation for “primary caches” back to the
> three deployment scenarios.  Is the ISP backbone the only one that
> operates a “primary cache?”

   To keep load on Global RPKI services from unnecessary peaks, it is
   recommended that caches which fetch from the Global RPKI not do so
   all at the same times, e.g., on the hour.  Choose a random time,
   perhaps the ISP's AS number modulo 60, and jitter the inter-fetch
   timing.

> ** Section 13.  This error code text appears to be verbatim from
> RFC8210.  The text also suggests that [iana-err] is the registry for
> these error code points.  [iana-err] says the reference for these
> errors is RFC6810 for 0-7, and RFC8210 for 8.
> 
> If [iana-err] isn’t being updated to point to this text as the
> canonical definition, then why repeat it here?  Couldn’t it just cite
> RFC8210?  What new information is being added?

is that going to work if we obsolete 8210?

but i removed

        All previous entries in the IANA "rpki-rtr-error" registry <xref
        target="iana-err"/> remain valid for all protocol versions.
        Protocol version 1 added one new error code:
           Error
           Code    Description
           -----   ---------------------------
               8   Unexpected Protocol Version

> ** Section 14.
> 
>       ... they need to be
>       given consistent trust anchors to use in their internal validation
>       process.  Distribution of a consistent trust anchor is assumed to
>       be out of band.
> 
> -- Distributed to who?  Is it the routers?
> 
> -- What makes a trust anchor “consistent?”  Is the intent of this text to say
> that in order to access a variety of caches, routers need the corresponding
> trust anchors of these caches?

   Cache Validation:  In order for a collection of caches as described
      in Section 12 to provide a consistent view, they need to be given
      consistent trust anchors of the Certification Authorities to use
      in their internal validation process.  Distribution of a
      consistent trust anchor set to validating caches is assumed to be
      out of band.

> ** Section 14.
> 
>       Hence, the last link, from cache to
>       router, is secured by server authentication and transport-level
>       security.
> 
> Please be clear that this is not always the case.  Section 9 allows for TCP.

      However, this protocol document assumes that the routers cannot do
      the validation cryptography.  Hence, the last link, from cache to
      router, SHOULD be secured by server authentication and transport-
      level security; though it might not be.  Not using transport
      security is dangerous, as server authentication and transport have
      very different threat models than object security.

> ** Section 14.
> 
>       The identity of the cache server SHOULD be verified and
>       authenticated by the router client, and vice versa, before any
>       data are exchanged.
> 
> (Excluding normal TCP) Are these verification and authentication
> practices something different than specified by the protocol behavior
> in Section 9?  If not, why the “SHOULD” as conformance to these
> protocols is a MUST.

    Reliable transport protocols (i.e. not raw TCP) will
    authenticate the identity of the cache server to the router
    client, and vice versa, before any data are exchanged.

> 
> ** Section 14.  The text doesn't explicitly describe the risks of using an
> unprotected TCP connection.

   However, this protocol document assumes that the routers
   cannot do the validation cryptography.  Hence, the last
   link, from cache to router, SHOULD be secured by server
   authentication and transport-level security to prevent
   monkey in the middle attacks; though it might not be.  Not

> ** Section 15.
>    The policy for adding to the registry is RFC Required per [RFC8126];
>    the document must be either Standards Track or Experimental.
> 
> Is this text needed?  The registry already says this is the policy.

ok.  less text is better.

> ** Section 15.
> 
> Protocol version 1 added one
>    new error code:
> 
>               Error
>               Code    Description
>               -----   ---------------------------
>                   8   Unexpected Protocol Version
> 
> Why is a change made by v1 germane to the IANA considerations of v2?  I don’t
> think this text is needed.

it's gone!

thanks a million for an impressively thorough review.

randy