[E2ee] Review of review-draft-knodel-e2ee-definition-04

Paul Wouters <paul@nohats.ca> Tue, 14 June 2022 22:37 UTC

Date: Tue, 14 Jun 2022 18:37:04 -0400
From: Paul Wouters <paul@nohats.ca>
To: e2ee@ietf.org
Message-ID: <0452ff0-ff6f-816d-2deb-6624531abcd@nohats.ca>
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/e2ee/VdFCLISRtL6jndMBvNevsWzAB6s>
Subject: [E2ee] Review of review-draft-knodel-e2ee-definition-04
Precedence: list


Thank you for writing this document. I think it is useful, even though I
am not a fan of the writing style.

Perhaps the most important part is that it seems to want to say that
E2EE systems are end to end encrypted but more. Is that really what
we need? Should we say Privacy Enhanced systems which have as one
component E2EE? Or do we really want to say "E2EE systems are end to
end encrypted systems that also tend to have a lot of other features". I
think the latter is meant, but then I think at a minium the abstract and
introduction should be a lot more clear about this.

Also the use of E2EE versus "end-to-end" is confusing, eg:

    Below is an exhaustive, yet vaguely summarised, list of the
    challenges currently faced by protocol designers of end-to-end
    encrypted systems.

I thought it would say E2EE systems, as that is the term this document
is defining, and end-to-end encryption is only one aspect of an E2EE
system - or at least I think the document wants to define this as such?


I have reviewed this in the form of an AD ballot, even though this is
jusy my early personal review. For more information on how the ballot
position system works, see https://www.ietf.org/blog/handling-iesg-ballot-positions/

Paul


DISCUSS:

I think section 2.1 has some fluff and could just present a definition.

I find the section 2.2 talk about Salzer very distracting. Why not just
explain the difference of the E2EE endpoints (user/user agent) versus the
data communication endpoints (eg the SMTP TLS example given). And maybe
explain transport security vs data origin security more clearly. Explain
hop to hop secucity vs message security vs packet security ? (with
security I also mean confidentiality). The user of [salzer] citations
feel like we are arguing the definition instead of defining it based
on regular IETF considerations.

    Permission of data manipulation or
    pseudo-identities for third parties to allow access under the user's
    identity are against the intention of E2EE.

I don't understand this sentence. It feels like it says pseudo-identities
are not E2EE features, but they are.

The use of the word "adversary" in section 2.4 is problematic. Is Apple or
Signal or Google an "adversary" in their role of application code? The
whole idea of E2EE to me is that there are roles that claim to have the
user interest at heart by preventing or limiting or breaker E2EE and
believe they do so against "adversaries" (CSAM, DNS filtering, web
proxy filter, anti-virus scanning are examples of roles a user might
deem "adversarial" and others feel "beneficial".

I find Section 3.1 confusing. It defines E2EE as "end to end encrypted" and
then talks about "enhancing" e2ee, but then that redefines what is e2ee ?

     messages are encrypted by the sender such that
     only the intended recipient(s) can decrypt them.

I would say "only the recipient(s) shared and agreed by all recipients".

This avoids the case of one recipient having a "backdoor" or "helper" that
sees cleartext. One infamous example is the Chinese input helper plugin of
Signal that sends cleartext to third party keyboard services for translating
input gestures to characters to send.

Section 3.1.2 Availability the "i.e." sentence confuses the "not online at
the same time" with "more than one device". It also introduces "if they have
been offline for a long time" which I think is a different issue from
"not required to being online at the same time". (eg Wire does
ratcheting I believe but if i don't talk to someone for months, I might
lose messages sent to me)

    by protocol designers of end-to-end encrypted systems

This should really say E2EE systems and not end-to-end encrypted systems.
Same for its continious use in Section 3.2. (see my introduction comment
for why).

    both for users and implementers (see previous section),

I neither understand "both for users and implementers" or how "(see
previous section)" applies. How could the users / implementers bit
be different? What is really meant with "previous section". The two
subsections above this one? Or section 2? And I feel no section could
explain the "both for users and implementers".

    2) in some way antithetical to the goals of end-to-end encrypted systems.

Could it be useful to list some of these? Eg like the keyboard plugin
example mentioned above?

    Users of E2EE systems should be able to communicate with any medium
    of their choice, from text to large files, however there is often a
    resource problem because there are no open protocols to allow users
    to securely share the same resource in an end-to-end encrypted
    system.  Client-side, e.g. end-point, activities like URL unfurling
    scanning.

I don't understand how the last sentence starting with "Client-side"
relates to the entire paragraph ("bullet") it appears in ?

Section 4.3 talks about third party access. Perhaps it should be extended
to talk about CSAM type scanning, eg third parties getting fingerprints of
data to check against a forbidden list of fingerprints. It is also confusing
how this type of third party access falls with respect to "without formally
interfering with channel confidentiality". What is formal and what is informal
interfering ?

Also the "expectation of that security property" is ever changing. Again look
at the CSAM example. If users of iMessage expect Apple to do CSAM scanning,
does this now no longer violate the expectation of the security property of
Confidentiality ? How can we even write down user expectations in this
document? Do most users actually object to CSAM scanning or DNS
filtering or virus scanning of content by their other participants? I
honestly don't know the answer.

    Analyses such as traffic fingerprinting or other (encrypted or
    unencrypted) data analysis techniques should be considered outside
    the scope of an E2EE system's goals of providing secure
    communications to end users.

Why should this be out of scope? What is wrong with designing a padding
system to obfuscate message size and even adding padding streams that
make it harder to determine if the users have a real conversation or not?

    Not only should an E2EE system value user data privacy by not
    enabling pattern inference,

So it is in scope after all ??

Ahh, Section 4.5 does talk about the expectation of not being compromised
after all. Good. Perhaps this is the place to talk about CSAM ?





COMMENTS:

Section 2.3 just drops:

    The more common end-to-end technique
    for encrypting uses the double-ratchet algorithm with an
    authenticated encryption scheme

It feels wrong to drop these terms like double-ratchet without explaining it.
I find the use of "we" a little odd for an RFC. It sounds like academic
speak which we tend to not do in RFC documents.

I really don't like the use of the word succinct throughout the document.
I think it is not a good word to use within an international context where
many speakers' first language is not English. (I recognise it might be
acceptable in academia, but I think this document is supposed to bridge
academia with software engineers). In Section 2.4, the "succint definition"
constitutes a large (not succint!) example as definition.

    E2EE systems are unique in
    providing features of confidentiality, integrity and authenticity for
    users.

The word "unique" is a bit odd. I guess it needs to be read in the context
of "messaging" but that's not obvious. Instead of sating E2EE systems are
unique, I would focus on explaining the uniqueness we are talking about
more clearly. How about:

    E2EE systems focus on providing a communication system for users that
    does not depend on trusted third parties for its confidentiality and
    integrity.

I left out authenticity on purpose, most services do depend on
authenticity of a (pseudo-)identity, eg jabber name, email, handle, gpg
id, etc.

or perhaps:

    E2EE systems focus on providing a communication system for users that
    prevents anyone from eavesdropping on those users' communication.


    their right to whisper.

I would not use whisper but state "communicate privately with a guarantee
that no one can eavesdrop on the conversation". Whisper to me still has
the connotation that someone nearby could maybe hear the message. I
think whisper might be a term used elsewhere in academia that I (and
presumbly other IETFers) are not familiar with.

    In the tradition of cryptography

I don't understand what the "tradition of cryptography" is ?

Section 3.1.2 introduces the term "participant" while before (end)user
was used. Is there a difference meant ? Why use this new term now?

    with a record of the transcript

Should that clarify this is the decrypted message ?

    As demonstrated by the Signal and OTR protocols

I'm not sure the protocols need to be named/advertised here.

    and older ones are immediately deleted after used.

I would say:

    and old keys no longer required to encrypt or decrypt any new
    messages are immediately destroyed.

I think the "Post-compromise security" is confusing. I think it should
more clearly say the compromise was found and countered.
I am also confused how adding new ephemeral keys would help if the
user's long term identity private key was compromised. And further,
how does the remote particiant know there might be compromised messages
in transit? I think I agree with the description of the term, but the
sentence that starts with "It is usually" is not enough of a description
of a "fix" for compromised situations.

    Also because "the IETF
    is a place for state-of-the-art producing high quality, relevant
    technical documents that influence the way people design, use, and
    manage the Internet" we can be confident that current deployments of
    end-to-end encrypted technologies in the IETF indicate the cutting
    edge of their developments

This is a rather circular reasoning. The quotation marks are also confusing
as it seems to be quoting something, but throwing it at google does not give
a hint about the source for this statement. It smells very much like "Wij van
WC Eend, adviseren WC Eend!" (Ask Olaf :)

    Below is an exhaustive, yet vaguely summarised, list [...]

Is it exhaustive as in "very surely entirely complete", or do you mean it
is a "tiresome list"? I think neither is what you mean to say :)
Maybe say "comprehensive" list? Or perhaps more in line with the internet
spirit, call it a "best effort list" :P

It is unclear this paragraph and the following are at a different level,
eg "part of the list" vs "not part of the list". Why not make a real list
with nice "o" letter dots, or via subsections if headers would be helpful?

    Therefore solving the problem of verification of
    public keys is a major concern for any end-to-end encrypted system
    design.

Again, E2EE vs end-to-end encrypted system

    and to move the private key to another device
    compromises the security of one of the end-points of the system.

I do not think this is correct. It might reduce the security to the
security of the weakest device, but if done properly, it does not
compromise the security. Eg me adding an ipad to my Apple ID and
installing Signal does not compromise the security of my Signal client
on my iphone nor does it compromise the past or future communications
with other participants. What is this sentence trying to say?

    Meta-data is difficult to obfuscate efficiently.

Obfuscating to me is a term for making something just a little harder
but not really impossible. It is not an asset to an E2EE sysem. I think
what is meant is more something like "Meta-data needs to be minimized as
much as possible, but one cannot elliminate all of it".

I also do not understand what the word "efficiently" does here? Why does
efficiency matter here?

    but this presents major problems of scale for end-to-end encryption

Why is this a "major problem" but the next bullet "forwards secrecy" is
only an unquantified "problem" ? I would simply remove "major".

    Usability considerations are sometimes in conflict with security
    considerations, such as message read status, typing indicators, URL/
    link previews.

I would definetely add the Chinese virtual keyboard issue in this list as
it is an example of a real life huge problem created in Signal and ignored
for years (See Naomi Wu's comments on this)

I think the 3.2 list of Challenges is missing an important issue; how does
the user know they can trust the software? Most of these E2EE systems are
not opensource, or even open protocols. How do these E2EE systems prove
they are not backdoor'ed or weak. Eg Signal tried to do something like
run server-side code in a CPU Secure Enclave (Intel SGX) and publish the
source code for verification (but does not mean it is opensource).
This is later touched on in Section 4, but why is it not lsited here?

And another item misisng on the list is the typical E2EE feature of
"disappearing images" that completely depend on client-side security
which cannot exist with opensource E2EE clients.


    Section 4.2 Providers are trustworthy

Do users really expect this? I agree some users do but some users don't.
And for most users there is a big differene between "at home" or "using
my telco" versus "coffeeshop" or "hotel" networks.

Trustworthy  A system is completely trustworthy if and only if it is
       completely resilient, reliable, accountable, and secure in a way
       that consistently meets users' expectations.  The opposite of
       trustworthy is untrustworthy.

This seems like a recursive definition. It also seems to throw in the
word "completely" to undo whatever definition it is trying to make as
now we have "Trustworthy" and "completely trustworthy" but only a
definition of the latter.

I find "users' expectations" a difficult concept to use too. Some users
expect their ISP to filter out malicious domains or emails. Are those ISPs
trustworthy or untrustworthy? It depends on which user you ask.

I now realise that perhaps I misread "Providers" as ISPs, but I don't think
it changes anything.

I like the closing definition in Section 4.2 but I felt it disjoined from
the section title. Perhaps change "the set of functions" to "the provider
of the set of functions" ?



NITS:

communications systems -> communication systems  ?

direction of travel -> development

it's also -> it is also

succint -> clear ?

we -> this document ?

I cannot parse "with amongst end points."

I personally don't like the work "holistically". Might be a english not my first language thing.

"such as [...] and more"  seems redundant.

       Steps should be taken to
       minimize metadata leakage such as user obfuscating IP addresses,
       reducing non-routing metadata, and avoiding extraneous message
       headers can enhance the confidentiality and security features of
       E2EE systems.

I would remove the last bit of the sentence starting at "can enhance ..."
for more consistency (and less redundany) with the other items in the list.

[E2ee] Review of review-draft-knodel-e2ee-definit… Paul Wouters
Re: [E2ee] Review of review-draft-knodel-e2ee-def… Mallory Knodel
Re: [E2ee] Review of review-draft-knodel-e2ee-def… Alec Muffett
Re: [E2ee] Review of review-draft-knodel-e2ee-def… Vittorio Bertola