Re: [Fud] (quick) review of draft-moran-fud-architecture-00

Brendan Moran <Brendan.Moran@arm.com> Tue, 22 August 2017 10:12 UTC

From: Brendan Moran <Brendan.Moran@arm.com>
To: "fud@ietf.org" <fud@ietf.org>
CC: Hannes Tschofenig <Hannes.Tschofenig@arm.com>
Thread-Topic: [Fud] (quick) review of draft-moran-fud-architecture-00
Thread-Index: AQHTGy8g7ZRoo/ApOEeGlGDwWXMJgQ==
Date: Tue, 22 Aug 2017 10:12:14 +0000
Message-ID: <8242BAEC-A5C4-48B0-94C2-C7AD28995455@arm.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="Windows-1252"
Content-ID: <710EF41A3CE99846B35D000D953F4819@eurprd08.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Aug 2017 10:12:14.1647 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB2716
Archived-At: <https://mailarchive.ietf.org/arch/msg/fud/Pzedq52WgoXcPBMTdjxXatEj5x4>
Subject: Re: [Fud] (quick) review of draft-moran-fud-architecture-00
X-BeenThere: fud@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: FUD - Firmware Updating Description <fud.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/fud>, <mailto:fud-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/fud/>
List-Post: <mailto:fud@ietf.org>
List-Help: <mailto:fud-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/fud>, <mailto:fud-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Aug 2017 10:12:21 -0000

Hi Emmanuel,

Please see my comments inline.

Best Regards,
Brendan

On Wed, 19 July 2017 18:33 UTC, Emmanuel Baccelli wrote:

> =====
> General comments:
> =====
>
> 1. In the proposed architecture some trust is required in the storage
> element, which is assumed to correctly:
> - serve the latest manifest/firmware (if the device pulls)
> - pushe/signal newer manifests to the device (if the author pushes)

I think that we are conflating two storage elements/cloud services:
1. the service that stores/distributes firmware images
2. the service that stores/distributes manifests.

These two services need not (and probably should not) the same. The manifest distribution service does require some trust: primarily to not deny service to the connected devices, and secondarily to not modify the manifests it distributes, however, modification will result in the rejection of the manifests. A secure communication medium such as DTLS might be appropriate in order to limit the possibility of MITM interference with the distribution of manifests.

The service that stores or distributes the firmware is a bit different. It can be completely untrusted. Because the firmware’s hash is stored in the manifest, any tampering with the firmware will be detected. From the point of view of a single device, this is still problematic, however from the point of view of a whole network of devices, responses are possible. For example, if many devices report the same validation failure, the device manager may upload the firmware to a different location and issue a new manifest that points to the new location.

> 2. What is specific to firmware in the document so far?
> It seems to me that this could apply to any software module on an IoT
> device, not necessarily a firmware.
> For instance it could be "application" logic (We have use cases where we
> are updating some software, but not the firmware).
> Why not generalize somehow the terminology to something more generic like
> "software update" or something similar?

The design goals are what makes this format specific to firmware, rather than the format itself. For example, one design goal is for the whole manifest to be able to fit in the memory of a micro controller at one time so that signature validation can progress before the manifest is stored into NVRAM, which eliminates some problematic corner cases, such as tracking the verification status of a manifest. This is a key distinction between this manifest format and RFC4108. Whereas the manifest format links to a firmware image, RFC4108 includes the whole image, which requires storing it prior to signature validation.

> =====
> Some more detailed comments:
> =====
>
> # in Section 3:
> why/how/what small bootloader?

Why:
One design goal of the firmware update architecture is to eliminate as many any situations as possible in which a device could fail during an update. Ultimately, this means that there needs to be at least one piece of firmware that is never updated, since we assume that power could fail at any moment. If there is a piece of firmware that is never updated, it must be thoroughly tested and (as close as possible to) free of bugs. We also must minimize the attack surface of the firmware so that we minimize the chance of a vulnerability being discovered.

How:
We call this piece of firmware the bootloader. The bootloader's role is to take a candidate image, verify it, then replace the active image if it is valid.

What:
In practice, this seems to create two classes of bootloader, with different absolute minimum requirements:
- on-chip candidate image storage
-- a flash driver to overwrite the active image
-- a mechanism to validate the image (a digest algorithm)
- off-chip candidate image storage
-- a flash driver to overwrite the active image
-- a mechanism to authenticate the image (e.g. HMAC, or CMAC)
-- a mechanism to read the off-chip image (e.g. e.g. a SPI flash driver)
N.B. these are the absolute minimum requirements.

If advanced features are desired, such as boot from network or inplace update over network, my recommendation is to use a 2-stage bootloader, where this minimal bootloader ensures that the second-stage bootloader is always updatable.

> why/how friendly to broadcast => no TLS, Object security

Why:
I mean "broadcast" in two contexts:
1. literal broadcast, i.e. IP multicast, radio, satellite, etc.
2. storage in an untrusted Content Distribution Network.

>From a security perspective, these two media are equivalent; any part of the image could be intercepted or tampered with.

Broadcast has many benefits, especially in constrained radio networks and mesh networks. Broadcasting a firmware update will dramatically speed up deployment. This applies equally to hosting an image on a CDN and broadcasting it into a mesh network.

How:
To make an image broadcast-friendly, it must not contain any per-recipient-unique information. It must also be able to determine whether or not the image applies to it early. Where firmware encryption is used, there will always be some recipient-specific information. The important aspect, however, is that the firmware itself is not specific to a particular recipient. The manifest itself may be specific to a recipient, however.

For example, a device must be able to identify whether a broadcast image applies to its hardware and software configuration. This could be done by matching vendor and model identifiers.

> but: are we really going to broadcast a (relatively large) firmware?

I believe that this is the only way to distribute firmware efficiently. Naturally, it will be distributed in blocks. The mechanics of this process are beyond the scope of the manifest draft and the firmware update architecture.

> # in Section 3.4:
>
> "the device is required to provide a minimum of two storage locations for
> firmware and one bootable location for firmware."
>
> Is reliability possible only with this approach?
> This seems out of scope of the generic architecture.

I think that this could be expressed better: The device is required to provide a minimum of two storage locations for firmware, at least one of which must be bootable.

As described above, there can be multiple boot stages. I think it's important to make the most minimal set of firmware non-replaceable. Equally, all of the firmware necessary to acquire a new image should be atomically replaceable. This could mean that the architecture has a slightly different structure:

+----------------------------+
| Stage 1 bootloader         |
+----------------------------+
| Stage 2 bootloader, slot A |
+----------------------------+
| Stage 2 bootloader, slot B |
+----------------------------+
| Application                |
+----------------------------+

Where:
* The Stage 1 bootloader is non-replaceable
* The Stage 2 bootloader is replaceable and capable of network upgrade of both itself and the application
* The application only gets a single slot, since the bootloader is capable of recovery if the application update is interrupted.

With this sort of layout, the requirement for two storage locations is satisfied, the requirement for resilience to failure is satisfied, but there is still only one application slot.

> # in Section 3.5:
> How about rephrasing like:
> "The approach must not require a large bootloader"

I think that "minimal" conveys the spirit of the non-replaceablility requirement better. The more features there are in the bootloader, the more likely it is that it will have a critical bug, which cannot be fixed for reliability reasons.

> # in Section 3.6
> Here I think the draft should reference examples of what "existing firmware
> formats" are alluded to.
> In my opinion, this section could be removed for two reasons:
> 1. because the metadata is anyways separate from the actual firmware,
> 2. the software might not be firmware, but just a software module (as per
> my comment above)

I also think this is not clear enough. The point of this section is that the update mechanism should not place any requirements on the payload that it conveys.

> # in Section 3.7
> I would suggest to have this document focus primarily on single MCU devices.
> Ideally: extensions to cover multi-firmware devices would either be in a
> later part of the document or in a separate document.
>
> At this point in the document it is obscure what the distinctions are
> between:
> Author, Store, Apply, Approve, Qualify

"Modules," in this context, was meant to mean that it would cover multiple software modules within a single MCE, though this is naturally extensible to multiple MCUs.

I'm not certain that "permissions" is the right word. The intent is that each storage location can assert a list of permissions that must all match in a given update. There must then be sufficient signatures from certificates that have those permissions to match each of the asserted permissions.

For example, if a firmware storage location requires Author, Store, and Apply, then an update will only be installed if an update is signed by a certificate with the Author permission, a certificate with the Store permission, and a certificate with the Apply permission. Alternatively, some certificates may have more than one permission.

This could allow some more complex use-cases, for example: If the update only has the Author and Store signatures, then the update is cached, but not installed until a new update referencing the same payload is dispatched with the Apply signature.

The permissions should be amended with descriptions:
* Author: To compile, assemble, link, encode, etc., the firmware. This is the fundamental permission that encapsulates the right to create a payload that a device consumes.
* Store: To place a payload in storage without installing it
* Apply: To take a stored payload and install it
* Approve: To indicate that an owner or operator has agreed to install the payload
* Qualify: To assert that a payload has been tested within its expected operating environment

I have debated adding an additional "permission" that a CI system can use to assert that tests have passed.

> # in Section 4:
> How about being more assertive here, such as:
>
> The firmware image MAY be encrypted and MUST be integrity protected AS WELL
> AS AUTHENTICATED/AUTHORIZED.
> The meta-data MUST integrity protected and AUTHENTICATED.

I'm happy with more assertive language, unless there is a good reason (i.e. a use case it prohibits) not to. The wording, however, should align with RFC2119:

> The firmware image MAY be encrypted and MUST be integrity protected
> and MUST be authenticated/authorized. The meta-data MUST be
> integrity-protected and authenticated

> # in Section 5:
>
>    -  When should the device apply the update?
>    out of scope?
>
>    -  How should the device apply the update?
>    out of scope?
>
>    -  Where should the firmware be stored?
>    out of scope?

I'm not clear on why any of these items would be out of scope for a firmware update architecture. Could you please elaborate?

>    -  What kind of firmware binary is it?
>    is this different from the question "should it apply the firmware"?

Yes, this is different. For example, a compressed binary and a raw binary are both written directly to flash (how) but the compressed binary must be deflated first (what). A binary wrapped in a CBOR wrapper is also written directly to flash, but start pointer of the binary must be located. So the "how" is the same, but the "what" is different.

I realise that it seems easy to make these the same, however I have found, working on the manifest format, that where there appears to be opportunity for optimization by merging two fields, it frequently ends up causing a problem later.

>    -  Where should the update be obtained?
>    if we are not making the assumption that metadata and firmware are
> stored in difference places, is this redundant?

The device needs to answer this question, regardless of assumption. If you make the assumption that the metadata and firmware are co-located, that simply means that the device has a hard-coded answer to the question, not that the question doesn't require an answer.

> # in Section 6:
> "information about when the firmware update has to be applied".
> I guess the default should be "as soon as possible".
> However, in cases when the time is no synchronized, I'm wondering how the
> device
> could interpret this "when" indication.

There are some parts of this architecture which will not be applicable to all devices. If a device has no notion of time, then it has two options: it can install a payload immediately, or it can store the payload and wait for a signal that authorises it to perform an installation. That is also a form of "when" information.

> # in Section 7:
> Typically the author will produce/transfer a new firmware and its
> (manifest) metadata in one go.
> In practice, firmware and manifest might even be bundled in a single file
> (?).
> Hence I would suggest to invert the order of the steps "Upload Firmware"
>  and "Create Manifest".

This section is just an example of the split-manifest/payload flow. That doesn't mean that contiguous manifest/payload is prohibited; this is just an example.

However, while contiguous metadata/payload may be current practice, that doesn't mean it will scale well to many devices. As detailed above, there are different trust requirements for manifest distribution and for payload distribution. There are also different bandwidth requirements for manifests and for payloads.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

[Fud] (quick) review of draft-moran-fud-architect… Emmanuel Baccelli
Re: [Fud] (quick) review of draft-moran-fud-archi… Brendan Moran