Re: [Last-Call] Opsdir last call review of draft-ietf-anima-constrained-join-proxy-09

Peter van der Stok <stokcons@bbhmail.nl> Tue, 05 April 2022 08:05 UTC

MIME-Version: 1.0
Date: Tue, 05 Apr 2022 10:05:16 +0200
From: Peter van der Stok <stokcons@bbhmail.nl>
To: Jürgen Schönwälder <j.schoenwaelder@jacobs-university.de>
Cc: ops-dir@ietf.org, anima@ietf.org, draft-ietf-anima-constrained-join-proxy.all@ietf.org, last-call@ietf.org
Reply-To: stokcons@bbhmail.nl
Mail-Reply-To: stokcons@bbhmail.nl
In-Reply-To: <164883335420.24992.11762904207626092789@ietfa.amsl.com>
References: <164883335420.24992.11762904207626092789@ietfa.amsl.com>
Message-ID: <dd02e4368fbd5f3e4c202db9c256f589@bbhmail.nl>
Organization: vanderstok consultancy
Content-Type: multipart/alternative; boundary="=_a3a277d048c3166aa9dd97ef7f2395a2"
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/WwnW7JU3W9yp53NQ1yADJlPzr0k>
Subject: Re: [Last-Call] Opsdir last call review of draft-ietf-anima-constrained-join-proxy-09
Precedence: list


Hi Jurgen,

Thanks for the review. I sympathize with your confusion issues. Many 
times I shared the same confusion on other IETF documents that I thought 
relevant for my work. IETF documents are not encouraged to rephrase 
parts of other RFCs or provide large operational HOWTO considerations. 
Actually, in other documents that I co-authored people were not happy 
about the large number of examples we provided. In my view the document 
should state the problem that is being solved, and the standard that 
proposes to remove the problem. I tried to do that in this document.

See below for my comments,

much useful text is added in response.

Greetings,

Peter

_____________________________________________________________________

Reviewer: Jürgen Schönwälder
Review result: Serious Issues

Let me start with a disclaimer: I am not familiar with BRSKI and ANIMA
and hence I have been reading this I-D as a confused outsider and some
of my concerns may not be valid or the result of me not understanding
the relevant technologies. That said, my conclusion after reading that
document is that it is not ready. At a high level, my concerns are:

- First, it seems to me that there are many options and there is no
   clear mandatory to implement baseline. Hence, there I am concerned
   that this specification will not necessarily lead to interoperable
   implementations.

Pvds ==>

We could add normative language for one option only. We prefer that 
based on use cases, an installation engineer could choose one option 
over the other. The simplest option is stateful which is common in 
today's translation devices, but again other use cases may not want to 
implement that and just do stateless. I think it is hard for us to 
choose between these two options.

==>

- Second, it feels like more attention needs to be payed to security
   concerns. Some of the options may actually be weak from a security
   point of view and hence narrowing options down may also be desirable
   to deal with security concerns. I do not think it is sufficient to
   state that some security issues may be solved by future work.

Pvds ==>

That will be changed. Below some text suggestions are done.

==>

- Third, as an ops-dir reviewer, I am lacking information how this
   will be operationally deployed, i.e., how a shared link will be
   properly configured that may have multiple mechanisms to bootstrap
   routable IP addresses. How do I force pledges to go through this
   procedure before I hand out or let them discover a routable IP
   address?

pvds==>

I am confused here; what is a shared link?

Actually, for the link-local discovery, the document relies fully on 
techniques which are described in other RFCs. The document does not add 
anything, apart from the character sequences that need to be registered 
by IANA.

A good point is perhaps that the use of a mesh network should be 
emphasized.

OLD:

   However, the Pledge will not be IP routable until it is authenticated

    to the network.  A new Pledge can only initially use a link-local

    IPv6 address to communicate with a neighbor on the same link

    [RFC6775] until it receives the necessary network configuration

    parameters.  However, before the Pledge can receive these

    configuration parameters, it needs to authenticate itself to the

    network to which it connects.

NEW:

   However, the Pledge will not be IP routable over the mesh network

   until it is authenticated to the mesh network.  A new Pledge can only

    initially use a link-local IPv6 address to communicate with a

    mesh neighbor [RFC6775] until it receives the necessary network

    configuration parameters.  The Pledge receives these configuration

   parameters from the Registrar. When the Registrar is not a direct

   neighbor of the Registrar but several hops away, the Pledge

  discovers a neighbor constrained Join Proxy, which transmits the DTLS

  protected request coming from the Pledge

  to the Registrar. The constrained Join-Proxy must be enrolled

   previously such that the

  message from constrained Join-Proxy to Registrar can be routed over

  one or more hops.

==>

I also wonder whether alternatives been considered. Is it really
necessary to introduce proxies that rewrite IP addresses?  Could it be
easier to let Pledges discover special temporary addresses that can be
used to reach (without going through a Join Proxy) the Registrar and
once a Pledge gets enrolled, it can pick up a more general address? Or
is the stateful solution not simply the more robust solution? How many
enrollments do we expect a Join Proxy to handle concurrently? What are
the bulk enrollment scenarios where a stateless solution would be
desirable?

I skimmed through draft-richardson-anima-state-for-joinrouter-03,
which has more alternatives. While properties of various solutions are
discussed, no clear conclusions are drawn. Back to this document,
perhaps I am missing also an applicability statement for the Join
Proxy solution.

Pvds==>

The number of simultaneous enrollments will depend heavily on the 
operational conditions and chosen physical installation procedure. It 
may range from one every 15 minutes to a few hundred in half an hour. I 
doubt that the latter frequency will ever be attained, but I have been 
amazed about deployments in the past. In short, I don't know.

This solution was chosen because the original BRSKI documents mentions a 
circuit proxy for https. This constrained proxy uses DTLS with coap and 
requires a low number of changes to the original BRSKI document. Also 
draft-richardson-anima-state-for-joinrouter was exploring various 
options, but it does not mean these are deployable. Most overlap with 
the two options that we have in this draft. I think adding that many 
options will probably add to the confusion and add burden for vendors to 
support them all.

==>

* Abstract

   I find the abstract difficult to understand for people not familiar
   with the context of this work. You have to read until the 2nd
   paragraph to get a clue that this has something to do with BRSKI, I
   think this should be said right away in the first sentence so that
   people know that what follows is about BRSKI specific concepts.

Pvds==>

Good suggestion; will change the paragraph order

==>

   And ideally the abstract would be understandable to people not
   deeply familiar with BRSKI terminology and concepts. After reading

      This document extends the work of Bootstrapping Remote Secure Key
      Infrastructures (BRSKI) by replacing the Circuit-proxy between
      Pledge and Registrar by a stateless/stateful constrained Join
      Proxy.  It relays join traffic from the Pledge to the Registrar.

   I had little clue what this document is about. Perhaps explaining
   things in simpler terms can help, e.g., something like this:

      This document extends the work of Bootstrapping Remote Secure Key
      Infrastructures (BRSKI) by specifying how a Join Proxy can relay
      a DTLS session originating from a Pledge with only link-local
      addresses to a Registrar not directly reachable on the link to
      which the Pledge is connected.

Pvds==>

My suggestion (I leave Circuit-proxy which is essential IMO):

NEW

      This document extends the work of Bootstrapping Remote Secure Key
      Infrastructures (BRSKI) by replacing the Circuit-proxy between
      Pledge and Registrar by a stateless/stateful constrained Join
      Proxy.  The constrained Join Proxy is a mesh neighbor of the

      Pledge and can relay
      a DTLS session originating from a Pledge with only link-local
      addresses to a Registrar which is not a mesh neighbor of the

      Pledge.

==>

   The title and the abstract both use the term "constrained Join
   Proxy" but later almost always the term "Join Proxy" is used.  So
   why is it a "constrained Join Proxy" and not just a "Join Proxy", or
   is there a difference between a "Join Proxy" and a "constrained Join

Pvds==>

   Good point.

   Either I write the constrained before every Join Proxy or I introduce 
a phrase stating that they describe one and the same concept. Not clear 
yet what I will do.

==>
   Proxy"? The captions of Fig. 2 and Fig. 3 state that they show a
   constrained joining message flow. Can there be others or is this
   technology for some reason only applicable for some sort of
   constrained devices?

* Join Proxy functionality

   I found the text a bit confusing. It talks about why packets to
   establish a DTLS connection with a Registrar won't be delivered and
   then afterwards it says that the Pledge is not even able to discover
   the IP address of the Registrar. Perhaps this text can be simplified
   and streamlined. It is rather obvious that if a Pledge has only a
   link-local address, it won't talk with a Registrar multiple IP hops
   away.

Pvds==>

Now I am confused. I expected you to require more text here.

Something seems to be missing in the description of the base line 
scenario, and I need more info to understand what the missing pieces 
are.

==>

   Are both modes required to be implemented? The stateless approach
   seems to require support by the Registrar while the stateful
   approach seems to be transparent from the Registrar's
   perspective. This apparently makes a big difference for the
   deployment options. To deploy the stateless Join Proxy somewhere in
   a big network, you need to update the Registrar to support it,
   right?

Pvds==>

Yes, figure 5 states the discoverable port in the Registrar.

==>

      IP_P:p_P = Link-local IP address and port of the Pledge
      IP_R:p_Ra = Routable IP address and join-port of Registrar
      IP_Jl:p_Jl = Link-local IP address and join-port of Join Proxy
      IP_Jr:p_Jr = Routable IP address and port of Join Proxy

   I was wondering why this is p_Ra, i.e., what the 'a' stands for. Or
   why is this not:

      IP_Pl:p_Pl = Link-local IP address and port of the Pledge
      IP_Rr:p_Rr = Routable IP address and join-port of Registrar
      IP_Jl:p_Jl = Link-local IP address and join-port of Join Proxy
      IP_Jr:p_Jr = Routable IP address and port of Join Proxy

   Well, how things are labeled may not be really important.

Pvds==>

This has been adapted as suggested by Rob Wilton in the AD review

==>

   I wondered: How does this all interact with SLAC and/or DHCP on a
   shared link? You seem to assume that SLAC and/or DHCP are disabled
   as long as a Pledge is not yet enrolled, right? In some networks,
   you will have also 802.X for enabling layer 2 ports. How do all
   these things fit operationally together? What are operationally
   meaningful setups?  In a shared network scenario, how do I
   effectively prevent a Pledge from using router advertisements to
   generate a routable address? Or is in such a deployment a Join Proxy
   simply not necessary? Perhaps these questions go beyond this
   document and they just show my lack of background.

Pvds==>

Only DTLS connections are allowed on the BRSKI mesh network. 
Certificates which are signed by the Registrar are used to set up the 
DTLS connections. Non protected messages may be routed but will never be 
accepted by the recipient.

==>

   Are there any message size issues since the stateless solution
   encapsulates the DTLS payload in another header? I see that this is
   mentioned in the table at the end as a property of the stateless
   mode, there is no discussion of any consequences this may have.

Pvds==>

No discussion is given, not knowing all operational conditions.

Installation engineers are given the choice.

==>

   There are three different discovery options. Are all three mandatory
   to implement? Is having many options to start with desirable from an
   interoperability point of view?

Pvds==>

Bob Wilton also commented on this aspect; that has been changed in the 
latest version

==>

   I tried to figure out how in 6.1.1 the Registrar is found. I
   followed several references, discovered several options, ended up in
   GRASP as one of them. Once I have the registrar's address, I can
   query the Registrar for more details. Then we have 6.1.2 which
   details how GRASP can be used directly to provide all relevant
   information. This section says it is "normative for uses with ANIMA
   ACP". Not sure what that means, did they authors mean that it is
   mandatory to implement for ANIMA ACP or that it is mandatory to use
   for ANIMA ACP? Normative feels like the wrong word, or is the other
   text not normative or what is conditionally normative in which
   contexts? As a newcomer, I only found section 6.3.1 reasonably clear
   (there is a link-local coap multicast, I can see how that works).

Pvds==>

Not sure about "normative for use" or "normative to implement"; Does 
"normative for use" imply "normative to implement"?

==>

* Security Considerations

   There may be more security relevant questions. How robust is this
   design against attacks? Can this be exploited for attacks?  How does
   a join proxy decide which (DTLs) traffic should be forwarded and
   which should not be forwarded, or is the idea that any traffic is
   forwarded? Is the Join Proxy required to verify that the forwarded
   traffic is actually (valid) DTLS traffic?

pvds==>

Good Point. In my understanding only DTLS connections are accepted by 
the destination. Refusing to route non DTLS traffic may be a bit 
prohibitive. The suggestions is to add the following text after the 
first paragraph.

NEW

A malicious constrained Join Proxy has a number of routing 
possibilities:

  	* It sends the message on to a malicious Registrar. This is the same 
case as the presence of a malicious Registrar discussed in RFC 8995.
  	* It does not send on the request or does not return the response from 
the Registrar. This is the case of the not responding or crashing 
Registrar discussed in RFC 8995.
  	* It uses the returned response of the Registrar to enroll itself in 
the network. With very low probability it can decrypt the response. 
Successful enrollment is deemed too unlikely.
  	* It uses the request from the pledge to appropriate the pledge 
certificate, but then it still needs to acquire the private key of the 
pledge. Also this is assumed to be highly unlikely.

A malicious node can construct an invalid Join Proxy message. Suppose, 
the destination port is the coaps port. In that case, a Join Proxy can 
accept the message and add the routing addresses without checking the 
payload. The Join Proxy then routes it to the Registrar. In all cases, 
the Registrar needs to receive the message at the join-port, checks that 
the message consists of two parts and uses the DTLS payload to start the 
BRSKI procedure. It is highly unlikely that this malicious payload will 
lead to node acceptance.

A malicious node can sniff the messages routed by the constrained Join 
Proxy. It is very unlikely that the malicious node can decrypt the DTLS 
payload. A malicious node can read the header field of the message sent 
by the stateless Join Proxy. This ability does not yield much more 
information than the visible addresses transported in the network 
packets.

==>

   The stateless proxy seems to allow outside attackers to send
   arbitrary packets to any link-local address inside.

Pvds==>

Like any node that can send link-local broadcast and unicast; I don't 
think this is specific to the constrained Join Proxy.

==>

This looks like
   a new reflection service that must be kept operationally under
   control, in particular since enrolled Pledges may later act as well
   as Join Proxies. The security considerations text indicates that
   future work may address this issue by encrypting the CBOR array.  Is
   this sufficient, do we really want to standardize a new reflection
   service that we then fix in the future? I am also not sure why level
   2 protection (what is 'level 2'? layer 2? link-layer protection?)
   will actually resolve the problem, once I can route IP packets to a
   Join Proxy, I can let it forward traffic to arbitrary link-local
   addresses, no?

Pvds==>

No; only DTLS packets can be sent to Registrars. The latter decides in 
combination with manufacturer's MASA if a node can be accepted in the 
network.

Level 2 => layer 2

Some new text is proposed.

OLD

    If such

    scenario needs to be avoided, then it is reasonable for the Join

    Proxy to encrypt the CBOR array using a locally generated symmetric

    key.  The Registrar would not be able to examine the result, but it

    does not need to do so.  This is a topic for future work

NEW

If such

    scenario needs to be avoided, the constrained Join

    Proxy MAY encrypt the CBOR array using a locally generated symmetric

    key.  The Registrar is not able to examine the encrypted result, but

    does not need to. The Registrar stores the encrypted header in the 
return packet without modifications. The constrained Join Proxy can 
decrypt the contents to route the message to the right destination.

==>

   Is there anything that prevents an attacker from creating a packet
   with a stack of JPY_messages, effectively source routing messages
   through a chain of Join Proxies? How will I debug such things if
   they happen?

Pvds==>

Interesting. In the added security text, I hope you agree to the answer. 
I don't think debugging is necessary; although detecting malicious nodes 
is always a challenging occupation.

==>

[Last-Call] Opsdir last call review of draft-ietf… Jürgen Schönwälder via Datatracker
Re: [Last-Call] [Anima] Opsdir last call review o… Michael Richardson
Re: [Last-Call] [OPS-DIR] Opsdir last call review… Fred Baker
Re: [Last-Call] [OPS-DIR] Opsdir last call review… Brian E Carpenter
Re: [Last-Call] Opsdir last call review of draft-… Peter van der Stok
Re: [Last-Call] Opsdir last call review of draft-… Jürgen Schönwälder
Re: [Last-Call] Opsdir last call review of draft-… Michael Richardson
Re: [Last-Call] [Anima] Opsdir last call review o… Brian E Carpenter
Re: [Last-Call] [Anima] Opsdir last call review o… Peter van der Stok
Re: [Last-Call] [Anima] Opsdir last call review o… Michael Richardson
Re: [Last-Call] [Anima] Opsdir last call review o… Rob Wilton (rwilton)
Re: [Last-Call] [Anima] Opsdir last call review o… Peter van der Stok
Re: [Last-Call] [Anima] Opsdir last call review o… Rob Wilton (rwilton)
Re: [Last-Call] [Anima] Opsdir last call review o… Peter van der Stok
Re: [Last-Call] [Anima] Opsdir last call review o… Toerless Eckert
Re: [Last-Call] [Anima] Opsdir last call review o… Toerless Eckert
Re: [Last-Call] [Anima] Opsdir last call review o… Rob Wilton (rwilton)
Re: [Last-Call] [Anima] Opsdir last call review o… Peter van der Stok
Re: [Last-Call] [Anima] Opsdir last call review o… Rob Wilton (rwilton)
Re: [Last-Call] [Anima] Opsdir last call review o… Michael Richardson
Re: [Last-Call] [Anima] Opsdir last call review o… Peter van der Stok
Re: [Last-Call] [Anima] Opsdir last call review o… Michael Richardson