[Tsv-art] Tsvart last call review of draft-ietf-suit-architecture-11

Bob Briscoe via Datatracker <noreply@ietf.org> Sun, 09 August 2020 23:33 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Bob Briscoe via Datatracker <noreply@ietf.org>
To: tsv-art@ietf.org
Cc: last-call@ietf.org, draft-ietf-suit-architecture.all@ietf.org, suit@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <159701600789.9734.7112047200124687933@ietfa.amsl.com>
Reply-To: Bob Briscoe <ietf@bobbriscoe.net>
Date: Sun, 09 Aug 2020 16:33:27 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/hpal4eaJGNGPhmZ-JTU7GPawZu0>
Subject: [Tsv-art] Tsvart last call review of draft-ietf-suit-architecture-11
Reviewer: Bob Briscoe
Review result: Ready with Issues

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

This review is long. For the benefit of busy readers, it is structured with 7
important issues listed first (and tagged either as technical or editorial),
followed by minor editorial comments for the authors.

Altho' it is ostensibly from the Transport Area Review Team, this review
identifies only one transport-related issue (see item #6a). Most of the major
discussion points are offered with a security hat on.

First I want to say that there's a lot of useful stuff in the draft. So I'd
like to apologize that the review comments raise issues, and do not dwell on
praising all the good stuff.

== Important Issues ==

1. Motivation for publication by the IETF [Editorial]

Until I reached the summary of the recent IoT IAB workshop in the first para of
the Security Considerations section, I was wondering why the IETF needed to
publish this. It seemed to be a description of what is already done in the
industry, but framed as an architecture. Most of this first para of the
Security Considerations section motivates this work, and ought to be moved to
the Introduction.

Even then, a document that describes what the industry already does isn't a
sufficient response to a security problem. Given (I believe) the intention is
to encourage the industry to systematically cater for firmware updates, perhaps
the draft needs to be a little more hard-hitting (without being patronizing of
course). Rather than giving the impression (except in the abstract) that it is
just describing current industry practice. For instance, see item #2 below
about saying what not to do. I would also suggest that it should highlight the
simplest architecture, only giving optional more complex extras later (see item
#4 below).

2. Is Anything Not Allowed by this Architecture? [Technical+Editorial]

a) A good architecture precludes as well as includes. Would it be useful to
list some common practices that are insecure, and perhaps some common
misconceptions about secure firmware update?

b) I could hardly find anything in this draft that did not equally apply to
firmware update of "Non-Things". It would indeed be useful to define a 'Thing'
(at least what this document means by it). I suggest: * unattended operation *
not within the operator's physical security control

c) On the subject of ruling things out, I felt the list of items ruled out of
scope in the Security Considerations include some items that are so central to
IoT that they should not have been ruled out of scope, and in the first two
cases quoted below, they didn't need to be ruled out of scope, because the
document addresses them: "
  - installing firmware updates in a robust fashion so that the update
    does not break the device functionality of the environment this
    device operates in.
  - the distribution of the actual firmware update, potentially in an
    efficient manner to a large number of devices without human
    involvement
  - energy efficiency and battery lifetime considerations.
"
And, wouldn't it be better to move scoping statements to just after the Intro,
rather than in Security Considerations? (And, yes, I know that not all Things
are energy-challenged, but the size of the subset that are is significant.)

3. Relying on Software with Security Vulnerabilities to Patch Security
Vulnerabilities [Technical]

The Intro only mentions 'software updates' generally, and doesn't explicitly
mention patching security vulnerabilities (altho the abstract does). Only
having read the Security Considerations section, do I discover that the draft
is primarily meant to be about patching firmware vulnerabilities.

That raises the question of how secure it is to download new firmware from a
device booted from firmware that is potentially already compromised. As a
minimum, surely the draft needs to mention this point. And preferably: *
whether anything can be trusted once firmware is compromised, and if so what. *
whether it is still worth updating firmware, even once a vulnerability in the
firmware update process has been identified, given:
  o identification of a vulnerability does not necessarily imply it has been
  exploited, or not prevalently exploited o a vulnerability might not make the
  firmware update process itself vulnerable (with an explanation of how to tell)
* describe which aspects of the firmware update process need to be run within a
TEE (and which not if any) * should the TEE lock the device against booting if
a firmware authentication or integrity check fails
  o how to prevent tampering with firmware integrity from itself being used as
  an attack, e.g.
    - by ensuring that, once a device is locked against booting, firmware
    re-update is never completely disabled - by ensuring firmware updates are
    not immediately retried without an exponentially increasing timer back-off,
    otherwise retries could lead to the devices flooding their own network with
    fruitless update traffic.

4. Please Focus More on the Simplest Architecture [Technical]

All the following increase system complexity, but are not /essential/ for
strong security: a) Status Tracking Per Device b) Confidentiality of the
firmware binary c) Robustness against rendering the device unbootable d)
Supporting both Message Authentication and Object Authentication (see item #5)
e) Broadcast Friendly (see item #6)

This draft is meant to be persuading the 'industry of Things' to provide
built-in secure firmware update. It tends to fall into the common trap of
setting the security bar so high that practitioners might give up in despair.

a) Per-device status tracking certainly might be preferred by many operators,
but the alternative of the operator not knowing the status of each individual
device might be acceptable (as in the example in Figure 5). Per-device status
tracking introduces the following complexity: * a need to separately identify
each device, both on each device, and in the status tracker. * a need to
securely identify each separate device (to prevent compromised devices
masquerading as all the other devices to give a false sense of security),
requiring management of separate public or shared keys

b) Confidentiality certainly might provide defence in depth against reverse
engineering the binaries, but it is ultimately security by obscurity, and so
ultimately optional. By definition (see item #2b) 'Things' are not in a
physically secure environment. So, unless all devices decrypt all downloaded
binaries within a TEE and store them in tamper-proof memory, once the binaries
are stored on each device, they will be accessible to external inspection
anyway. So the document should be less dogmatic about confidentiality
protection (3rd para of Intro), and at least explain that, with IoT,
confidentiality on the wire is moot unless there is also confidential device
storage as well.

c) Robustness against rendering the device unbootable
Often, when I initiate an (attended) firmware update, the OS warns me that this
is a sensitive process that could render the device useless if the power fails
part-way through. So clearly, this is a cost-tradeoff that device designers are
willing to compromise on. Therefore, I don't think the IETF is entitled to
pronounce a requirement against this practice. I would rather see this text
moved from Requirements to somewhere else in the doc, as a commentary on the
implementation issues, rather than stating it as a requirement. Climbing down a
bit at the end by saying it is only an implementation requirement doesn't help.

5. Both Message Authentication and Whole Object Authentication? [Technical]

Message authentication codes aren't specifically mentioned, until sections 7 &
8, where they are mentioned as if they might be used, without saying why or
how. The document needs to discuss the merits of MACs vs. authentication of the
whole manifest and/or the whole firmware binary.

Ultimately, if an object's authenticity and integrity will be verified once it
is fully delivered, there is no need for MACs as well. However, using message
authentication reduces the risk that the device is talking with an imposter at
an early stage in the transmission, rather than having to wait until it is
complete. And it is easy to arrange message authentication to cumulatively
authenticate the whole object, without additional infrastructure for
whole-object verification. Therefore using MACs could avoid the need to provide
enough storage for a complete update of the firmware as well as the current
version - after verifying the manifest and the first message, the device could
even start to overwrite the firmware it is currently booted from.

The above strategy would not be without risk, but my point is not just to
suggest this particular strategy. The document ought to at least discuss the
trade-offs between MACs and whole-objection authentication, and whether both
are really necessary.

6. Friendly to Broadcast Delivery? [Technical]

Section 3. states this as one of the "Requirements", although the text softens
it to "may be desirable for some networks". However, broadcast delivery
introduces the three significant problems below, wrt a) reliable transport; b)
device energy efficiency; and c) broadcast message authentication.

a) Reliable Broadcast Transport
Delivery of binary objects needs to recover lost or corrupt packets. Reliable
broadcast delivery at scale is extremely challenging. It needs either fountain
coding [1] or reliable multicast. * Fountain coding delivers an object in a
continually repeating stream and ensures that the data in any missing packet
can be reconstructed from data in a subsequent different packet. But this would
increase device complexity. * For broadcast delivery, per-packet
acknowledgements (ACKs) from each device do not scale. Negative ACKs (NACKs)
can be used but they also do not scale. If a loss is experienced close to the
root of the broadcast/multicast, it still causes an implosion of negative ACKs
(NACKs) on the sender. Reliable multicast (e.g. PGM [RFC3208]) arranges a
spreading tree of delivery nodes each of which handles NACKs solely from its
next-degree downstream neighbours. Clearly this increases network or CDN
complexity.

b) Broadcast Energy Efficiency
If the IoT device is wireless and needs to take care with its energy
consumption, it will need to initiate all communications, rather than have to
sit with its radio powered up listening for an incoming message. However, of
course, it is not possible for each device to independently initiate an
incoming broadcast. It would be possible for a broadcast to be scheduled, and
for each device to poll for the schedule. But this would add complexity,
particularly because all the device clocks would have to be fairly closely
synchronized.

c) Broadcast Message Authentication
Message authentication has potential advantages over whole-object
authentication (see #5). When MACs are used over unicast, typically the cost of
asymmetric crypto for each message is avoided by using asymmetric crypto just
once to transmit a shared key, which is then used to verify each MAC. However,
that process is only secure for unicast. For broadcast or multicast delivery,
the sender only sends each message once, using one key for the MAC that would
therefore have to be shared with every receiver. Then any receiver could
masquerade as the genuine sender. TESLA is a solution to this [RFC4082], but it
would again increase the complexity of each device and the servers, not least
because it requires loose clock synch (nonetheless, uTESLA has been implemented
for challenged devices [2]).

Aside regarding broadcast encryption:
In section 3.3. "Use state-of-the-art security mechanisms", it says:
  "The information that is encrypted individually for each device must
  maintain friendliness to Content Distribution Networks, bulk storage,
  and broadcast protocols."
That implies a magic encyption scheme that is beyond any state-of-the-art that
I am aware of! If information is encrypted individually for each device, surely
by definition it will not be friendly to broadcast protocols. Actually, I
suspect the authors did not mean to say "encrypted individually for each
device", because a shared group key is adequate for confidentiality - a shared
group key is only problematic for message or source authentication (see above).

7. Missing Security Concerns [Technical]

a) Avoiding Reliance on the Device's System Clock

I suggest that the document makes the point that it is preferable for the
firmware update process not to rely on the device's system clock.

Reasoning: Even if the TEE maintains the system clock, protection against
attacks on this clock rely on voting between multiple time sources. No amount
of authentication provides any proof of message timing. So, it is hard for a
TEE to protect against tampering with the timing of its messages, given they
pass via the untrusted execution environment of the rest of the device, similar
to the problem of a secure time source for virtualized functions [3].

I think IoT developers can be reassured that none of the requirements for
firmware update need to rely on the system clock. For instance roll-back attack
prevention (section 3.4) only requires comparison between version numbers, not
comparison between a release time and the clock.

However, I think not relying on the clock is worth mentioning, because key
expiry and key revocation have to be designed carefully to avoid relying on
secure time, and this is a subtle point that might not be appreciated by IoT
device designers.

b) Key revocation

When keys are in tamper-resistant storage but otherwise not within a physically
secure site, the question of revocation surely has to be addressed. In
particular, there should be a discussion about the advisability or otherwise of
pre-loading the same keys into multiple devices.

== Minor Editorial Issues ==

1. Intro
  "Updates to the firmware of an IoT device are done to fix bugs in software..."
This would be a good place to highlight the focus on patching security
vulnerabilities.

"This version of the document assumes... Future versions may also describe..."
I assume this aspiration needs to be deleted now?

2. Terminology

There are ~22 occurrences of lower case 'must' in this document, and one
'should' (excluding multiple uses in rhetorical questions). I'm not sure
whether it is intentional to make it seem like this is an RFC that is mandating
behaviour, perhaps for readers who don't understand the subtleties of the IETF
informational track. I would prefer it to be clear that this document is not
mandating anything, by using alternatives to 'must' like 'ought to' or 'has
to'. Otherwise it could be considered disingenuous.

  "The term ’system on chip (SoC)’ is often used for these types of devices."
Perhaps more useful:
  "The term ’system on chip (SoC)’ is often used interchangeably with MCU, but
  MCU tends to imply more limited peripheral functions."

  "The following entities are used:"
The list is a mix of stakeholders and functions, which tends to show that the
authors themselves might not be clear about the distinction. It would be useful
to split into two lists.

  "The terms device and
  firmware consumer are used interchangeably since the firmware
  consumer is one software component running on an MCU on the
  device."
I didn't notice them being used interchangeably. If they are anywhere, why not
just edit to use whichever term is more appropriate and delete this sentence?

Status Tracker
  "While the IoT device itself runs the client-
  side of the status tracker it will most likely not run a status
  tracker itself unless it acts as a proxy for other IoT devices in
  a protocol translation or edge computing device node."
The client-side of a status tracker surely does run a status tracker itself
(the clue is in the name). I know what is intended, but the writer was clearly
in two minds as to whether a status tracker is the combination of client and
server or just the server.

3. Requirements

3.5 "High reliability" -> 'Robust against becoming unbootable'.
The title for this requirement otherwise implies a much more general
requirement than the description under it.

3.6 Small bootloader
"...again using firmware updates over serial,
USB or even wireless connectivity like a limited version of
Bluetooth Smart."
Don't see why it has to be "...a limited version of...". Suggest these words
are deleted.

s/poses a risk in reliability/
 /poses a reliability risk/

s/must fit in the available RAM/
 /must fit in the available memory/
(not necessarily RAM)

s|there are not other task/processing running|
 |there are not other tasks/processes running|

s/unlike it may be the case/
 /unlike that which may be the case/

s/Note: This is an implementation requirement./
 /Note: This last paragraph is an implementation requirement./
(Otherwise, 'this' could ambiguously refer to the whole requirement)

3.7 Small Parsers
"Since parsers are known sources of bugs they must be minimal."
To be honest, I suspect the target audience will find this sentence and others
like it rather pious. Given the purpose of this document is meant to be to
encourage implementers to provide secure firmware update, I think these
peripheral "requirements" will just serve to make any implementers reading this
feel they are being patronized.

As with the earlier requirement about 'robustness against becoming unbootable',
I think many of these 'requirements' would be easier to stomach within a
discussion of tradeoffs, rather than as a list of pronouncements that demand
perfection.

3.8
s/Minimal impact on existing firmware formats/
 /No impact on existing firmware formats/
Reason: This is what the text underneath says.

3.9 Robust permissions

  "...the authorization policy is separated from the
  underlying communication architecture. This is accomplished by
  separating the entities from their permissions."
I'm not sure whether either of these sentences makes much sense (at least not
to me). Perhaps the first sentence means to say that
  "...the authorization policy is separated from the
  firmware it applies to"
And then the second sentence could be deleted. I'm not sure the second sentence
would ever be necessary, because entities are always separate from their
permissions (otherwise you would have to access an entity to find out you
weren't allowed to access it). To be honest, I don't really see the point of
the whole requirement. So if it is important, maybe its meaning needs to be
clarified for people like me. Otherwise, if it's just stating the obvious,
maybe it's not necessary at all.

3.10. Operating modes
Later, in S.5. the term 'delivery modes' is used. If these are meant to mean
the same thing, then the same term should be used consistently. In my
experience, the term 'interaction model' is used to describe things like polled
request-reply, push, publish-subscribe, etc.

"The pre-authorisation step involves verifying..."
When describing a distributed system, pls avoid passive sentences like this,
which don't specify which entity is performing the action. It is followed up
later by "...the firmware consumer must also...", which implies the subject is
the firmware consumer, but it's best not to rely on implication, especially not
if it requires two passes to understand.

  "Pushing a manifest and firmware image to the transfer to
  the Package resource of the LwM2M Firmware Update object"
Garbled?

  "...it may need to wait for a trigger from the
  status tracker to initiate the installation, may trigger the update
  automatically, or may go through a more complex decision making
  process to determine the appropriate timing for an update"
I had to read this a few times before realizing it was a list.
How about:
  "... to initiate the installation, it may either need to wait for a trigger
  from the status tracker; or trigger the update automatically; or go through a
  more complex decision making process to determine the appropriate timing for
  an update"

3.11.
s/Suitability to software and personalization data/
 /Suitability for software and personalization data/

The document suddenly jumps into a different style at the start of 3.11, more
like an log of WG activity than a requirement. Pls consider making the style
consistent, especially given it switches back after the first sentence of the
2nd para.

4. Claims
s/Only install firmware with a matching vendor/
 /Only install firmware with a matching author/ ?

5. Communication Architecture

The document often repeats that it's agnostic to the communication
architecture, then this section starts with the phrase:
  "Figure 1 shows the communication architecture..."
Perhaps it means 'firmware update architecture'?
Or, possibly this implies that the authors are in two minds as to what
'communications architecture' means. Or the heading was intended to be
'Communications Architectures' (plural) and the first phrase was meant to say
  "Figure 1 shows an example communication architecture..."

The text needs to make it clear that a status tracker is optional in the client
pull case but not in the server push case (see item #4a earlier).

It would be useful for the doc to say what it means for an operator circle to
enclose a function. For instance the 'Device Operator' in Fig 1 encloses the
status tracker, which to me implies it controls the status tracker. However,
the network operator encloses the device, which probably doesn't imply it
operates the device. Perhaps an enclosing circle means 'within the physical
security control of'? The network operator isn't mentioned in the text - why is
it in the diagram, given it has no role in the firmware update, other than as a
common carrier of opaque bits?

  "The following assumptions are made to allow the firmware consumer to
  verify the received firmware image and manifest before updating
  software:"
The following three bullets aren't really assumptions. Perhaps 'statements
about the verification process' would be a better phrase. Would another
reference to suit-information-model here be useful, to explain why the details
are not given here?

See item #4b) above about highlighting that confidentiality is optional, not
just 'deployment specific'.

  "There are different types of delivery modes, which are illustrated
  based on examples below."
Shouldn't this sentence start section 5? (Also see my earlier point about
'operating modes' / 'interaction modes' terminology).

Fig 3 is inconsistent with Fig 1, in that it omits the firmware consumer
function.

Fig 4 is inconsistent with Figs 1 & 3, in that there is also an arrow from the
status tracker to the author. What does this imply?

  "This architecture does not mandate a specific delivery mode but a
  solution must support both types.
Whatever for? This requirement surely over-plays the IETF's hand, which is not
in a position to make such a demand? Is the intention really that being
agnostic to the delivery mode means every solution must support all delivery
modes?

6. Manifest

Given each of the items in the second bullet list addresses one of the
questions in the first bullet list, it would be useful to tabulate them
side-by-side and to put them in a more meaningful order, e.g. in the order they
occur during firmware update. Also, the the first question bullet (author
trust) is not specifically addressed in the second list - implied within the
last bullet, but not explicitly stated.

7.1
s/Combined with the non-relocatable nature of the code/
 /Due to the non-relocatable nature of the code/

7.3
  "This configuration has two or more CPUs in a single SoC that share
  memory (flash and RAM). Generally, they will be a protection
  mechanism to prevent one CPU from accessing the other’s memory."
I know what is intended, but it reads as if line 1 contradicts line 3. Perhaps:
 "...
  mechanism to prevent one CPU from unintentionally accessing memory currently
  allocated to the other."

9. Example

In at least one example figure, it would be useful to show the initial
pre-loading of keys, policy logic and trust anchor into the firmware consumer /
bootloader.

s/starting with an author uploading the new firmware to firmware server/
 /starting with an author uploading the new firmware to the firmware server/

  "This setup does
  not use a status tracker and the firmware consumer component is
  therefore responsible for periodically checking whether a new
  firmware image is available for download."
It needs to be much clearer that the status tracker has both a monitoring
function and an update triggering function. So, altho it is essential in the
server push model - to trigger updates, it's monitoring function means it is
not ruled out for the client pull model.

Fig 5 & 6 are inconsistent, in that the former omits the IoT device box around
the Firmware consumer and bootloader.

s/Figure 6 shows an example follow with the device using a status tracker./
 /Figure 6 shows an example with the device using a status tracker./

  "For editorial reasons the author publishing the manifest at
  the status tracker and the firmware image at the firmware server is
  not shown."
How about:
  "Depiction of the author publishing the manifest at
  the status tracker and the firmware image at the firmware server would
  be the same as in Figure 5. So for brevity they are not shown."

11. Security Considerations

Between
  "A report about this workshop can be found at [RFC8240]."
and
  "A standardized firmware manifest format..."
there either needs to be some glue text to explain that the initial manifest
format was an output of the workshop (if it was), or a new para if the second
sentence really doesn't follow from the first.

Note also that I suggest (item #1) that the motivating text about the workshop
should be moved to the introduction. I also say (in item 2c) that the scoping
bullets would be better at the end of the Intro too. However, I can also see a
case for them remaining under Security Considerations; to admit that the
document does not fully address all possible security concerns.

Given this could leave nothing in the Security Considerations section, it would
be appropriate to merely point to all the sections of the document that already
cover security matters.

== References ==
[1] Byers, J.; Luby, M.; Mitzenmacher, M. & Rege, A. A Digital Fountain
Approach to Reliable Distribution of Bulk Data Proc. ACM SIGCOMM'98, Computer
Communication Review, 1998, 28

[2] Perrig, A.; Szewczyk, R.; Wen, V.; Culler, D. E. & Tygar, J. D. SPINS:
Security Protocols for Sensor Networks Proc. ACM International Conference on
Mobile Computing and Networks (Mobicom'01), 2001, 189-199

[3] Briscoe (Ed.), B. & others Network Functions Virtualisation; Security;
Problem Statement ETSI NFV Industry Specification Group (ISG), ETSI NFV
Industry Specification Group (ISG), 2014
[Tsv-art] Tsvart last call review of draft-ietf-s… Bob Briscoe via Datatracker
Re: [Tsv-art] Tsvart last call review of draft-ie… Bob Briscoe
Re: [Tsv-art] Tsvart last call review of draft-ie… Hannes Tschofenig
Re: [Tsv-art] Tsvart last call review of draft-ie… Bob Briscoe
Re: [Tsv-art] Tsvart last call review of draft-ie… Hannes Tschofenig