[apps-discuss] APPSDIR review of draft-ietf-decade-arch-04

Carsten Bormann <cabo@tzi.org> Mon, 23 January 2012 01:44 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Apple Message framework v1251.1)
From: Carsten Bormann <cabo@tzi.org>
Date: Mon, 23 Jan 2012 02:43:47 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <A7D68D42-9FCC-4C84-ADC4-62F03696558B@tzi.org>
To: IETF Apps Discuss <apps-discuss@ietf.org>, draft-ietf-decade-arch-04.all@tools.ietf.org
Cc: decade@ietf.org, SM <sm+ietf@elandsys.com>
Subject: [apps-discuss] APPSDIR review of draft-ietf-decade-arch-04
Precedence: list

I have been selected as the Applications Area Directorate reviewer for
this draft (for background on APPSDIR, please see
http://trac.tools.ietf.org/area/app/trac/wiki/ApplicationsAreaDirectorate).

Please resolve these comments along with any other Last Call comments
you may receive. Please wait for direction from your document shepherd
or AD before posting a new version of the draft.

Gruesse, Carsten
---------------------------------

Document: draft-ietf-decade-arch-04
Title: DECADE Architecture
Reviewer: Carsten Bormann
Review Date: 2012-01-22

** Summary: This draft is not ready for publication as an
Informational RFC and should be revised before publication.

Note: I decided to review this by reading the architecture document
only, to see whether it is able to stand alone. Note that this
implies that the review is likely incomplete. Given the cluster of
entangled documents this is a part of, I recommend a concerted review
of the next version(s).

** Major Issues:

A1) General:

Although this is not explicitly said in the introduction, the
objective of this document appears to be both:
-- to provide an architecture that will constrain and guide the
further work of DECADE;
-- to present the architecture in an introductory, reasonably
accessible way, which will facilitate understanding the specific
protocol specifications envisaged.

These two (prescriptive vs. descriptive) objectives of this document
do conflict, and the conflict is not always managed.

In particular, the document goes to considerable detail in describing
the protocols, but it is not clear whether this is just illustrating
the architecture (as I would expect in an architecture document) or
actually constraining the protocol design. E.g.,
-- for the write-through PUT (section 7.1), it is specified that just
one target server can be given to the intermediary. Is this an
accident or deliberate?
-- For GET, returning the data is optional (section 7.1)?
-- "DRP is specified as being carried through extension fields within
an SDT (e.g., HTTP headers)." (section 6). Is it always extension
fields or is it sometimes the body? (Well, the HTTP body could be
called an extension field of HTTP, too.) I think the point is that
the DRP data are mostly piggy-backed on SDT. Why not say that.

A1a)
There are a number of places where the architecture is not yet
explicit about the role of entities and data objects that it requires
to function. Again, the document needs to decide for itself whether
these entities and objects are illustrative only or part of the
prescriptive elements of the architecture.

E.g.,
-- is the "abstract specification of ... operation" in 6.2.1 and
6.2.2 only provided for illustration, or is the architecture limiting
itself to exactly these two operations?
-- There appear to be some implicit parameters such as application
context?
-- Or, for a PUT, how are metadata such as the expiration time
established?
-- Is the introductory sentence of 7 intended to limit the
server-to-server interaction to a pull model ("download")?
-- What is the semantics of a third-party (client-to-server-to-server)
GET with respect to the middle server? Is the initiating server
supposed to execute a local PUT with the result? Or what is its
role?

A2) Terminology:

The architecture defines a number of terms quite deliberately (section
2), but misses out on a few important ones.
Some important roles in the architecture (such as the ticket
generating server) are only introduced cursorily, without considering
the implications of their existence to the architecture.

A2a)
"user" (4.5.2) appears to be a central concept of the architecture,
but is fleshed out only very thinly. A related concept might or might
not be "account", which is only touched on, or "principal" (used in
the appendices only).

A2b)
4.5.2 introduces an "Application Provider" that is used nowhere else.
What is that? Is that an important functional entity?

A2c)
The capability architecture (the "token" as a data structure, and its
interaction with various functional elements) is a central element of
the DECADE architecture.
-- See RFC 4949 with respect to the usage of the term "token".
-- The "token generating server" appears to be important, but is not
called out in the list of functional elements in 2.
How does a client select/find one?
-- The document repeatedly (5.4, 6.1.2) states that a DECADE client
must trust the token generating server, but never indicates why.
-- Obviously, the DECADE servers need to trust the tokens. This is
not discussed.
-- The token is said to contain data object names, but then it is also
meant to be useful for a "batch of operations", some of which may
concern data the names of which we don't know yet.
-- How is it useful to "allow a DECADE Server to detect when a token
is used multiple times" (what is the server supposed to do when it has
detected that?)?
-- Do tokens need a revocation mechanism?

A2d)
Sections 4.3 and 6.1.3 use a concept called "application context".
Apparently application contexts are quite important for DECADE
operations (e.g., 6.1.3 makes clear that "objects" are always
associated with an application context); what are these application
contexts? Who creates, deletes them? Resource control, access
control for them? (Some operations seem to have an application
context as an implicit parameter. Assumptions like these need to be
spelled out.)

A2e)
3.1: "Let S(A) denote A's DECADE storage server." This concept of
ownership is never explained. Is it important?

A3)
The appendices provide relatively raw existence proofs that are likely
to be overtaken by events in a year from now. Much of these are
(overly) brief mini-tutorials for the relevant protocols. Appendix
A.3 is about a protocol that itself does not seem to be fully cooked
at this point.
This is certainly useful material to collect for the WG, but it is not
clear that these should be part of this document. There are lots of
additional issues in these appendices, e.g.:

A.1.1.1)
HTTPS (where is the reference?) is a security protocol, but does not
provide access control.
A.1.1.3)
This would need to (at least briefly) examine the interactions between
HTTP caching and DECADE protocol operation.
A.1.3)
This specifies (?) "In the reply, the hash is sent in an ETAG header."
What kind of response are we talking about? 304? Is this really
part of the architecture?
A.1.5)
Why should the transfer protocol provide the complete access control
mechanism? Access control is a local function. Transfer protocols
just have to make sure the necessary parameters are in place (and/or
may be used for transferring the parameters in the first place).
When talking about OAUTH 2, add the relevant reference(s).

I have not undertaken to review the appendices in any detail.

A4)
Is the architectural thinking converged enough on issues of naming?
E.g.,
-- 4.3 seems to imply "resource identifiers" are being used that are
the same between different servers.
-- 5.3.1 seems to support this by building names in a predictable way
out of hashes. In particular, "a DECADE client knows the
name of a data object before it is completely stored at the DECADE
server."
-- However, if DECADE is to be used for real-time interactions, some
thought needs to be given on the point in time when hash-based data
identifiers/names can be generated. A DECADE client that PUTs
video to a DECADE server may not have the complete byte-string of a
slice in hand when it starts sending, so it can't send a hash-based
name at the start. This is likely to have some impact on the
protocol mappings possible. (It also makes it less clear that
there is a good reason not to support name generation by the
server right from the start.)
-- A.2.3 says "DECADE may find the concept of collections to be
useful if there is a need to support directory like structures in
DECADE. It also discusses WebDAV's MOVE and COPY operations.
What is the point when the name uniquely follows from the content?
-- 6.1.2 says tokens include "Permitted objects (e.g., names of data
objects that may be read or written)" and "It is possible for DRP
to allow tokens to apply to a batch of operations to reduce
communication overhead required between DECADE Clients." Does this
require prescience on what the hash values of future slices will
be?

A5)
Authorization based in IP addresses (6.1.2 "permitted clients") is
rarely appropriate.

A6)
Much of the information discussed in 6.1.3 will be PII. The
architecture must discuss how the protocols will provide the
flexibility to cope with different data protection and surveillance
regulations. For instance, the level of logging performed by a server
may be an important parameter that must be indicated to the client
before it starts operation, or some of it may conversely be
clandestine.

A7)
Please rewrite section 9 from scratch. There is no need to explain
fundamentals of cryptographic data structures (assuming that the next
version will use terms that can be referenced properly). Instead,
actual security considerations of the DECADE architecture must be
discussed, e.g., the cache discovery attack mentioned above. More
importantly, there needs to be a discussion of the threat model, the
trust relationships envisaged, etc. Please see RFC 3552.

** Minor Issues:

M1) Terminology

Beyond the problems listed above, the draft needs an overhaul in its
terminology. E.g.:

-- it uses "TTL" as a term for an object expiration time, without ever
explaining the term. (What is actually meant is an expiration
time, *NOT* a lifetime/duration or hop count that would be
analogous to IPv4's use of the abreviation.)

-- using "data object" as the term for the things saved in a DECADE
server is highly confusing. It is not always clear whether the raw
byte string or the combination of this and certain metadata is
meant. Do NOT use "contents" in its plural form as a synonym for
"data objects" (4.2). Indeed, the document would improve by using
"content" very sparingly, only in the overview sections, and being
precise about data objects otherwise. (It would be preferable to
have a name for the "data objects" that is distinctive from the
plain English meaning of that term. E.g., slice.)
E.g., while we learn about data objects that they are immutable and
not all of the same size, we need consistent terms for the various
kinds of metadata used, such as the DECADE metadata that are used in
managing the localized storage vs. those metadata that would be
visible in the SDT.
"If an application wishes to store such metadata persistently
within DECADE, it can be stored within data objects
themselves." (What does that mean? New, separate objects? Within
the existing ones? In the slice byte-string itself?)

-- "data transport protocol" contains the term "transport protocol"
which means something different in the IETF. We tend to use
"transfer protocol" for the purpose intended.

-- 4.4 introduces a "location". What is that? A DECADE server?

-- "Traffic De-duplication" is a seriously misleading term for
validated cache access. The whole point of the validation protocol
in 8.2.1.2 appears to be to protect the cache at S against a
colluding pair of A and R, under the assumption that A is not
authorized to access S' copy of the object but compensates by being
authorized to access R's copy. Since R can (1) indicate
authorization and (2) prove to S it does have the data, both using
the challenge-response protocol, S can fulfill the request for R.
If that is the point, please say that. Please note that, from this
exchange, A and R can still extract the fact that S had a copy.
Discuss security implications of this discovery.

4.1)
"However, the architecture may allow for more-than-one data transport
protocols to be used."
This *is* the architecture. It either allows it or not.
(BTW, shouldn't the architecture also say something about
negotiation/capability discovery?)

4.5.1)
"The Storage Provider delegates the management of the resources at a
DECADE server to one or more applications."
What does that really mean? (And are the latter "Content Distribution
Applications"?)

5.4)
Is this really a digital signature?
(Please reserve the term "digitally signed" for actual signatures, as
opposed to including a kind of peer entity authentication that is
directed towards a specific recipient. See RFC 4949.)

6.1)
"...DRP allows one instance of such an application, e.g., an
application endpoint, to apply access control and resource sharing
policies on each of them." (them = DECADE servers.) That last
sentence is rather ominous. Is this completely trivial, or does it
actually mean anything? Is DRP maybe a reliable multicast protocol
for control data?

6.1.4)
The term "MIME type" has been superseded by "media type" (please also
reference the relevant RFCs here). It is also not clear to me what
that media type means in case of a slice of a larger resource
representation. Why is a media type not copied with the object?

7.1)
"It is also assumed that the operation performed at the remote server
is the same as the operation in the original request."
Explain "the same" -- are all parameters identical? Or is it just GET
vs. PUT?

** Nits: [list editorial issues such as typographical errors, preferably by section number]

1)
"Content Distribution Applications" in the first sentence is not
defined. Point to 2.6.

4.2)
"are referred as" -> "are referred to as"

4.3)
"Objects that are stored in a DECADE storage server can be accessed by
DECADE content consumers by a resource identifier"
second by -> via

4.3)
" Because a DECADE content consumer can access more than one storage
server within a single application context, a data object that is
replicated across different storage servers managed by a DECADE
storage provider, can be accessed by a single identifier."
Non sequitur.
Change to:
>>
A DECADE content consumer may be able to access more than one storage
server within a single application context. A data object that is
replicated across different storage servers managed by a DECADE
storage provider can still be accessed by a single identifier.
<<
[Now, it is still not quite clear from that sentance whether that is a
MUST (i.e., the whether the architecture mandates that all replicated
copies MUST have the same identifier).]

4.5.2)
"applications granted resources"?
applications being granted resources?
resources granted by applications?

5)
s/principals/principles/
(Just once in the first paragraph; otherwise, principle vs. principal
has been used correctly.)

6.2.2)
defered -> deferred

7.1)
"Note that when a DECADE client invokes a request a DECADE server with
these additional parameters" -- syntax.

8.2.1.1)
"When a DECADE client (A) indicates its DECADE account on a DECADE
server (S) to fetch an object from a remote entity (R) (a DECADE
server or DECADE client)..." What? The "account" is asked to fetch
from a "client"?

Ceterum censeo)
RFCs, as any kind of formal technical publication, should use units in
accordance with ISO/IEC 80000, in particular IEC 80000-13.
Replace Mbps by Mbit/s, KB by KiB.

** Random observations:

O1)
The proto writeup says:

> The document was reviewed by DECADE WG members, the WG Chairs, and
> key non-WG contributors, particularly by David E Mcdysan, Borje
> Ohlman, Akbar Rahman, Ning Zong and Dirk Kutscher.

Akbar Rahman and Dirk Kutscher are co-authors of this document, so I
sure hope they have reviewed this document.

O2)
The architecture does not give an argument why multiple SDTs are
needed when all of them are just HTTP anyway. (Binding the SDT to
multiple underlying protocols creates a lot of headaches that may be
completely unnecessary. At least they aren't motivated.)
But maybe it is not the job of the architecture document to actually
motivate this highly complexity-inducing generality.

E.g., A.2 alludes to a mapping to WebDAV, but then seems to go on
suggesting modifications to WebDAV to enable that layering. This
doesn't seem consistent. Indeed, it seems unlikely that DECADE can
layer cleanly on top of either WebDAV or CDMI. A more productive view
of these protocols may be as a toolkit to take certain parts from, that
HTTP does not have, and that DECADE does not want to re-invent.
Special care must be taken not to create a chimera, though.

---

[apps-discuss] APPSDIR review of draft-ietf-decad… Carsten Bormann