[Captive-portals] Benjamin Kaduk's Discuss on draft-ietf-capport-architecture-08: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 09 June 2020 05:30 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: captive-portals@ietf.org
Delivered-To: captive-portals@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D8D43A0876; Mon, 8 Jun 2020 22:30:36 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-capport-architecture@ietf.org, capport-chairs@ietf.org, captive-portals@ietf.org, Martin Thomson <mt@lowentropy.net>, mt@lowentropy.net
X-Test-IDTracker: no
X-IETF-IDTracker: 7.3.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <159168063615.8302.17239207964322081612@ietfa.amsl.com>
Date: Mon, 08 Jun 2020 22:30:36 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/captive-portals/X-4Qx4gKFWyPD8_UMHhI7Y64v3Q>
Subject: [Captive-portals] Benjamin Kaduk's Discuss on draft-ietf-capport-architecture-08: (with DISCUSS and COMMENT)
X-BeenThere: captive-portals@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Discussion of issues related to captive portals <captive-portals.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/captive-portals>, <mailto:captive-portals-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/captive-portals/>
List-Post: <mailto:captive-portals@ietf.org>
List-Help: <mailto:captive-portals-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/captive-portals>, <mailto:captive-portals-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jun 2020 05:30:36 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-capport-architecture-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-capport-architecture/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) and (2) should be easy to fix; (3) may well be "fixed" by telling me
I'm too naive :)

(1) Given that section 1 describes other options, the abstract should not
limit to just DHCP and RA as options for provisioning the API URL.

(2) Section 4.1 says that:

   5.  The Captive Portal API server indicates to the Enforcement Device
       that the User Equipment is allowed to access the external
       network.

but I believe this should be the "Captive Portal Server" (or, as the
previous point has it, the "web portal").

(3) Probably a "discuss discuss", but ... in Section 1 we have:

   *  Solutions SHOULD NOT require the forging of responses from DNS or
      HTTP servers, or any other protocol.  In particular, solutions
      SHOULD NOT require man-in-the-middle proxy of TLS traffic.

I'd like to understand the motivation for this one a little better.
Naively, it seems like we could get away with "MUST NOT require" while
still allowing it to be done.  Am I missing something obvious?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I'd like to see some more discussion of which signals are authenticated
and how, and what kind of authorization checks are possible.  In
well-run networks DHCP and RA signals should be relatively trustworthy,
but clients don't always have a good indicator for whether a given
network falls into that category.  Are there (other) mechanisms that can
be used to give trust in the authenticity of a given Captive Portal API
URI and that that API is authorthorized to provide unconstrained access
for the network in question?  We require TLS for accessing the API
server, but (as I note inline) there are more details that can be given
about this TLS usage.  What can be done to authenticate and authorize
the Captive Portal Server?  Most importantly (and most appropriately for
an architecture document), which of these properties are strictly
required vs. merely optional?  These are not Discuss-level points
because an architecture does not strictly-speaking need to specify all
of them, but having some indication of how we plan to achieve them would
give greater confidence that this architecture will be a useful one.

I'm happy to see the response to the genart reviewer's comment regarding
"a" vs. "the" capport architecture; thanks!

Abstract

   This document describes a CAPPORT architecture.  DHCP or Router
   Advertisements, an optional signaling protocol, and an HTTP API are
   used to provide the solution.  The role of Provisioning Domains

nit: there's perhaps a bit of a lack of parallelism in the list
structure, where we talk about specific mechanisms for provisioning
without describing the more abstract concept of provisioning, and list
that alongside an abstract mention of "a signaling protocol" and the
both-abstract-and-concrete "HTTP API".

Section 1

   Implementations generally require a web server, some method to allow/
   block traffic, and some method to alert the user.  Common methods of

nit: I'd suggest clarifying that this is "implementations of captive
portals" (or is it "captive networks"?).

   alerting the user involve modifying HTTP or DNS traffic.

nit: perhaps "at present" or "prior to this work"?  If I understand
correctly one of the goals of this work is to shift the balance of
captive portals away from these practices (while acknowledging that
fully eliminating them is not feasible in the near future).

   *  Solutions MAY allow a device to be alerted that it is in a captive
      network when attempting to use any application on the network.

I'm also not sure I understand this one, especially in light of the
following (paraphrased) "SHOULD allow learning of captivity before
application attempts to use the network".  What's the alternative to
"MAY allow", not-allowing such detection at all?

   *  The architecture MUST provide a path of incremental migration,
      acknowledging a huge variety of portals and end-user device
      implementations and software versions.

nit: "preexisting" or similar would go a long way here.

   *  Network provisioning protocols provide end-user devices with a

side note: using the word "provisioning" to describe things like DHCP
and RA feels odd to me, presumably due to my background and what I
expect provisioning to be.  I can see why it makes sense to use the term
for this purpose, though.  Perhaps an additional adjective could help
clarify what is meant, though I don't have a suggestion at hand.

      for this purpose are available in [RFC7710bis].  Other protocols
      (such as RADIUS), Provisioning Domains [I-D.pfister-capport-pvd],
      or static configuration may also be used.  A device MAY query this

side note: personally, I'd expand to "may also be used to convey this
API URI", though it's probably not required for clarity.

      The device MAY take immediate action to satisfy the portal
      (according to its configuration/policy).

side note: it's not entirely clear to me that we need a normative MAY
for this.

Section 2.1

   have Internet access).  The User Equipment communication is typically
   restricted by the Enforcement Device, described in Section 2.4, until
   site-specific requirements have been met.

It seems like these "site-specific requirements" must be the "Captive
Portal Conditions" that we just defined.

   *  SHOULD have a mechanism for notifying the user of the Captive
      Portal

It is pretty important that this mechanism be non-spoofable by, e.g.,
untrusted websites.  I think we should mention something about
"non-spoofable" here.

   *  MAY prevent applications from using networks that do not grant
      full network access.  E.g., a device connected to a mobile network
      may be connecting to a captive WiFi network; the operating system
      MAY avoid updating the default route until network access
      restrictions have been lifted (excepting access to the Captive

nit: maybe say in which direction the update would go and/or something
about why the move to wifi is desirable?

   None of the above requirements are mandatory because (a) we do not
   wish to say users or devices must seek full access to the captive
   network, (b) the requirements may be fulfilled by manually visiting
   the captive portal web application, and (c) legacy devices must
   continue to be supported.

side note: in my opinion, it's possible to support legacy devices in
practice without baking their limitations into the spec.

   If User Equipment supports the Captive Portal API, it MUST validate
   the API server's TLS certificate (see [RFC2818]).  An Enforcement

We should probably cite RFC 6125 here and say something about how the UE
gets a name to validate the server's certificate against (and what name
type to use).

   [I-D.ietf-capport-api] for more information.  If certificate
   validation fails, User Equipment MUST NOT proceed with any of the
   behavior described above.

I'm not sure which behavior the "behavior described above" is.
"[accessing...] OCSP responders, CRLs, and NTP servers" doesn't seem
quite right since that's *how* you determine that certificate validation
fails, but the bits further up about "navigate [to] the Captive Portal
user interface" do not seem to clearly call out a single behavior or set
of behaviors by the UE.

Section 2.2.2

   Although still a work in progress, [I-D.pfister-capport-pvd] proposes
   a mechanism for User Equipment to be provided with PvD Bootstrap
   Information containing the URI for the JSON-based API described in
   Section 2.3.

I don't think "JSON-based" is supported by the text of § 2.3 (and isn't
really appropriate for an architecture doc in most cases, anyway).

Section 2.3

   The purpose of a Captive Portal API is to permit a query of Captive
   Portal state without interrupting the user.  This API thereby removes
   the need for User Equipment to perform clear-text "canary" HTTP
   queries to check for response tampering.

nit: probably don't need to be specific about HTTP, here.

   At minimum, the API MUST provide: (1) the state of captivity and (2)
   a URI for the Captive Portal Server.

Is there anything useful to say about the URI scheme for the captive
portal server URI?  I guess I could probably (grudgingly) come up with a
case where http-not-s would be tolerable, but given that we admit the
possibility of "payment" as a captive portal condition, I don't want us
to encourage sending payment or other sensitive information over schemes
inappropriate for such information.

   A caller to the API needs to be presented with evidence that the
   content it is receiving is for a version of the API that it supports.

What about evidence that the content it is receiving is intended to be
used with, and authorized to speak for, the network it is joining?

   When User Equipment receives Captive Portal Signals, the User
   Equipment MAY query the API to check the state.  The User Equipment

nit: we seem to use "the state of its captivity" most places.

   The API MUST use TLS to ensure server authentication.  The
   implementation of the API MUST ensure both confidentiality and
   integrity of any information provided by or required by it.

It's a little weird to split the TLS requirements between here and
Section 2.1, though I guess if we're splitting things by role it's
probably unavoidable.  (I made my RFC 6125 comment in Section 2.1 and it
probably doesn't need to appear in both places.)

Section 2.4

   *  May signal User Equipment using the Captive Portal Signaling
      protocol if certain traffic is blocked.

nit: I think that "optionally signals" might be a better fit for the
list structure as used in the other bullet points.

Section 2.5

   When User Equipment first connects to a network, or when there are
   changes in status, the Enforcement Device could generate a signal
   toward the User Equipment.  This signal indicates that the User
   Equipment might need to contact the API Server to receive updated
   information.  For instance, this signal might be generated when the
   end of a session is imminent, or when network access was denied.

Would this signal also be used when the UE has successfully met the
Captive Portal Conditions?

Section 2.6

   *  The User Equipment queries the API to learn of its state of
      captivity.  If captive, the User Equipment presents the portal
      user interface from the Web Portal Server to the user.

[we previously discussed this UE behavior as optional.  I don't mind
having the text be descriptive like this, since it's describing the
diagram, and the diagram is not binding on all UEs, but it seemed worth
noting just in case.]

Section 3.1

   An Identifier is a characteristic of the User Equipment used by the
   components of a Captive Portal to uniquely determine which specific
   User Equipment is interacting with them.  An Identifier MAY be a

Do we want to say anything about what scope within which the uniqueness
must hold?  ("No" is probably fine.)

Section 3.2.1

   Each instance of User Equipment interacting with the Captive Network
   MUST be given an identifier that is unique among User Equipment
   interacting at that time.

side note: "MUST be given" gets a knee-jerk "by whom?" response from me.
It's probably okay for this document to not specify, though, as it may
depend on the nature of the Captive Network.

   Over time, the User Equipment assigned to an identifier value MAY
   change.  Allowing the identified device to change over time ensures
   that the space of possible identifying values need not be overly
   large.

Is the identifier assigned to a given UE on the same network expected to
be able to change as well?  This may have some privacy considerations...

Section 3.2.2

   are active at the same time.  This property is particularly important
   when the User Equipment is extended externally to devices such as
   billing systems, or where the identity of the User Equipment could
   imply liability.

nit(?): is it the UE that is extended externally or the identifier
thereof?

Section 3.2.4

   In some situations, the User Equipment may have multiple IP
   addresses, while still satisfying all of the recommended properties.

nit: as written, "while still satisfying all of the recommended
properties" is describing the UE, but the context of Section 3.4
suggests that we want to be talking about the recommended properties for
identifiers.

Section 3.5

   Accessing the API MAY depend on contextual information.  However, the
   URIs provided in the API SHOULD be unique to the UE and not dependent
   on contextual information to function correctly.

Should the per-UE APIs and/or the mapping between UE and per-UE API be
unguessable?  (Do we want to reference Capability URLs
[https://www.w3.org/TR/capability-urls/]?)

Section 4

I might consider explicitly saying "non-normative" somewhere in here.

Section 4.1

   4.  If necessary, the User navigates the web portal to gain access to
       the external network.

nit: "navigates to"

Section 4.2

   3.  The User Equipment's UI indicates that the length of time left
       for its access has fallen below a threshold

   4.  The User Equipment visits the API again to validate the expiry
       time

side note: I feel like there's implicitly some User action in here,
though I don't know that we need to actually say anything about it.
(Otherwise we wouldn't have the UI indicating things.)

Section 4.3

   Whenever a new Portal URI is received by end User Equipment, it
   SHOULD discard the old URI and use the new one for future requests to
   the API.

What kind of validation/authorization checks need to be applied to the
new Portal URI?

(nit: we probably should check the terminology in this section; the
Section 1.2 lexicon would call this information the "Captive Portal API
Server URI" and not a "Portal URI".)

Section 7

This mechanism rather inherently requires having multiple entities track
the UE's identity (and, thus, likely be tracking a proxy for the user's
identity).  It seems appropriate to include some discussion of the
privacy considerations of this tracking, and whether/what kind of
anonymity support is appropriate!

Section 7.1

   Given that a user chooses to visit a Captive Portal URI, the URI
   location SHOULD be securely provided to the user's device.  E.g., the
   DHCPv6 AUTH option can sign this information.

I'm not sure that I understand the intent behind the "Given that"
construction here.  Is it trying to emphasize user choice, and thus the
need for informed choice?

Section 7.2

[In the vein of my previous remarks, there are many ways to use TLS, and
usually we provide more details on how we expect TLS to be used.]

Section 7.3

   The API MUST ensure the integrity of this information, as well as its
   confidentiality.

Who/what is the attacker(s) that we need to preserve confidentiality from?

Section 7.4

   *  Accesses to the API Server are rate limited, limiting the impact
      of a repeated attack.

One might consider a flooding attack that tries to get the UE to use all
its (rate-limited) connections to get some information that is not the
information that it's most important for the UE to have.  If there's
only a single operation that can be performed at the API Server (which I
believe is the intent?) there is no such attack, but it may be worth
mentioning that there is no such attack.

Section 8.1

Interestingly, none of the places where we reference 7710bis have
surrounding text that clearly incur a normative dependency.

Appendix A

We explain the use of the "canary" term here, but have already used it
twice (with no forward-reference) in the body of the document.

   Another test that can be performed is a DNS lookup to a known address
   with an expected answer.  If the answer differs from the expected
   answer, the equipment detects that a captive portal is present.  DNS
   queries over TCP or HTTPS are less likely to be modified than DNS
   queries over UDP due to the complexity of implementation.

Is the reader supposed to draw the conclusion that DoTCP/DoH provide
less-reliable captive-portal detection than Do53?  (I assume "TCP" is
not a typo for "TLS", here, though am unsure enough to want to check.)

   Malicious or misconfigured networks with a captive portal present may
   not intercept these requests and choose to pass them through or
   decide to impersonate, leading to the device having a false negative.

nit: I suggest "these 'canary' requests" to clarify which requests we're
talking about.