[Gen-art] Gen-art last call review of draft-ietf-scim-core-schema-17

Elwyn Davies <elwynd@dial.pipex.com> Fri, 17 April 2015 21:27 UTC

Message-ID: <55317A7E.7050707@dial.pipex.com>
Date: Fri, 17 Apr 2015 22:26:22 +0100
From: Elwyn Davies <elwynd@dial.pipex.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: General area reviewing team <gen-art@ietf.org>, draft-ietf-scim-core-schema.all@tools.ietf.org
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/xlZYoX8vBeQNkBMirJ6pWt6DQ2E>
Subject: [Gen-art] Gen-art last call review of draft-ietf-scim-core-schema-17
Precedence: list

I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

Document: draft-ietf-scim-core-schema-17.txt
Reviewer: Elwyn Davies
Review Date: 2015/04/09
IETF LC End Date: 2015/04/20
IESG Telechat date: (if known) -

Summary: Not ready.  The 'major' issue identified is really political 
rather than strictly technical although the proposed syntax does limit 
the applicability (or at least the easy applicability) of the scheme. 
Making the schemas more aware of practice outside the basic English 
speaking world should be an aim of IETF work, IMO.  The minor issues are 
mostly  only just more than editorial nits - and there are quite a few 
of these also.

Major issues:
===========
s4.1.1, "name" attribute:  The definition of this attribute is 
culturally insensitive.  The
collection of name sub-attribute terms are North American/UK/Aussie/NZ 
English -speaking biased.  The authors might wish to consider 
http://www.w3.org/International/questions/qa-personal-names.  To a 
lesser extent this also applies to the definition of the addresses 
attribute in s4.1.2.  The issue of the representation of postal 
addresses incorporated in I-Ds and RFCs in the xml2rfc schema has been 
debated at length on the rfc-interest mailing list.  The new (v3) 
vocabulary replaces the specific sub-attributes with an ordered  list of 
"postalLine" elements (see 
https://tools.ietf.org/html/draft-hoffman-xml2rfc-16#section-2.39). 
Further, the use of country codes in RFCs has been dropped some time 
ago.  It might be better to represent the address in a less specific way 
and leave display up to user interfaces that can consider the relevant 
locale.  My suggestion, FWIW, would be to have a country, possibly a 
code field plus an ordered array of postalLines that can contain any of 
the additional components and cater for any locale specific format.

======================================================

Minor issues:
===========
Reference to SCIM Protocol document:  At a bare minimum a normative 
reference to the SCIM protocol document (currently 
draft-ietf-scim-api-16) is needed in s1.2 where the protocol is referred 
to in the first two definitions.  In my opinion, this document would be 
improved by the addition of a brief overview of the operation of the 
SCIM protocol and the implications for the design of the schema.   For 
example, s2 talks about 'replacement of a resource':  Knowing in advance 
that one of the operations anticipated in the protocol is replacement 
makes this clearer.

s1.1, Use of OPTIONAL and REQUIRED:  These terms are overloaded in this 
document.  The majority of uses are not specifying features of the 
protocol as per RFC 2119 but indicating the necessity or otherwise of 
the presence of particular attributes in resource types.  AFAICS the 
only RFC 2119 usages are one place in  s2.2.7 for OPTIONAL  and two 
adjacent places in s10.3.1 for REQUIRED .   To avoid the overloading it 
would be easy to omit OPTIONAL and REQUIRED from the RFC 2119 list, use 
the alternative RFC 2119 terminology (MAY in s2.2.7 and MUST in s10.3.1) 
and provide a separate note on the usage of OPTIONAL and REQUIRED in s1.1.

s2.1, Syntax of attribute names:  I am confused by the constraints 
suggested here.
(1)   "Attribute names SHOULD be camel-cased":  AFAICS this has no 
impact on the specification or protocol.  My guess is that the 
specification has adopted the convention normally used in JavaScript.  
This is merely a representation of the convention used in SCIM schemas 
and RFC 2119 language is inappropriate.  I suggest replacing this with
"This document uses the camel-casing convention for attribute names 
(e.g., "camelCase").
(2) "nameChar   = "-" / "_" / DIGIT / ALPHA": Given the close 
association with JavaScript, it seems inappropriate to allow hyphen (-) 
as a character in attribute names as this is illegal in JavaScript.
(3) The definition should say whether attribute names are case sensitive.
(4) Even though there is ABNF, it would be useful to note explicitly 
that names are limited to a subset of ASCII rather than the much wider 
JSON string or JavaScript variable character sets.

s2.2.7, $ref:  In s2.2.7, $ref is defined as a sub-attribute name but 
does not match the attribute name syntax discussed in the previous 
comment for s2.1.  Does the attribute name syntax apply to sub 
-attributes?  Or are they just JSON member names?

s2.3, next to last para:  To ensure that the service provider knows what 
it ought to do to canonicalize a given value, the schema specification 
needs to specify what canonicalization means for each type of 
attribute.  Having read further on, I see that this is done in most 
cases for relevant attributes defined in this draft.   A note that this 
should be done generally when defining new schemas is needed here.  This 
is particularly important for strings that might have 
internationalization issues (c.f., the discussion of string comparison 
in filtering in section 5 of draft-ietf-scim-api-16.)

s7, canonicalValues:  The wording here
>          When
>          applicable service providers MUST specify the canonical types
>          specified in the core schema specification; e.g., "work",
>          "home".
seems to imply that the possible canonicalValues mentioned in the 
definitions of User, Group etc.  schemas earlier in the draft are 
actually normative minimum requirements that could, at least in some 
cases, be extended.  The wording used in the earlier sections is rather 
less definitive and appears to indicate that the suggested values are 
examples that a service provider might possible want to replace if they 
considered alternative values better suited to their application, e.g.
>    userType
>       Used to identify the organization to user relationship. Typical
>       values used might be "Contractor", "Employee", "Intern", "Temp",
>       "External", and "Unknown" but any value may be used.
and
>    phoneNumbers
>       Phone numbers for the user.   ...  The "display" sub-attribute
>       MAY be used to return the canonicalized representation of the
>       phone number value.  The sub-attribute "type" often has typical
>       values of "work", "home", "mobile", "fax", "pager", and "other",
>       and MAY allow more types to be defined by the SCIM clients.
The wording used in the earlier sections seems to need 'tightening up' 
to make it clear what minimum set of canonicalValues is required for 
conformance, if indeed that is what is wanted.

s7, caseExact:  I think you may need to clarify what case insensitivity 
means for languages other than unaccented English.  It may be sufficient 
to provide a note and a pointer to the discussion of filtering and 
normalization in the protocol draft.

s10.3:  The registration procedure seems overly complex.  If, as stated, 
an RFC is required in all cases, then the standard (RFC 7035) IETF 
Review registration policy would seem to fill the bill and there is no 
need for a designated expert.  Alternatively, Specification Required 
(with a designated expert as is standard for this case) could be used if 
other types of specification could be countenanced.  I suspect the 
requirement for a standards track RFC as a way of modifying an existing 
value is going to come back to bite us if the original specification was 
not standards track.  I am not sure this attempt to provide a higher 
hurdle for modifications is the best way to go about this - In general, 
IETF Review would, I think, give enough pushback against inappropriate 
updates without requiring standards track in all cases.  Overall, I 
recommend that the authors consult your AD and IANA to determine how 
best to structure the registration procedure.

===========================================================

Nits/editorial comments:
=====================
Global: s/e.g. /e.g., /

The term 'endpoint': The term '(network) endpoint' has a particular 
technical meaning in W3C/HTTP jargon although it the usage in (e.g.) 
http://www.w3.org/TR/wsdl.html seems rather self-referential.  It would 
be useful to provide a definition.  Perhaps something like:
(Network) endpoint:  Also known as a 'port' (see 
http://www.w3.org/TR/wsdl.html).  A  port has a 'port type' that 
identifies a set of operations invoked by HTTP methods.  Each port is 
identified by a URI  typically constructed from the base URI identifying 
the server implementing the operation and a relative URI bound to the 
port type.  The methods are associated with abstract data types, such as 
the schema specified in this document.  HTTP messages carry data 
structured according to the abstract data types.

Canonicalized URLs:   Presumably URLs should be canonicalized in line 
with Section 6 of RFC 3986.  An appropriate global place to say this 
would be s2.3 I believe.   However RFC 6986 offers a 'ladder' of 
canonicalizations and it would be desirable to say what rung on this 
ladder should be used.  Presumably either 6.2.3 or 6.2.4.

s1, para 1, last sentence:  The phrase 'redundantly integrated' is not 
felicitous.  Suggestion:

OLD:
    Similarly, cloud services
    providers seeking to inter-operate with multiple application
    marketplaces or cloud identity providers must be redundantly
    integrated.
NEW:
    Similarly, cloud services
    providers seeking to inter-operate with multiple application
    marketplaces or cloud identity providers would require pairwise
    integration.
END

s1, para 2: Worth adding a reference to [PortableContacts] since you 
have it already and its not a 'well-known' item.
I fear that LDAP is not a well-known abbreviation within the meaning of 
the act, and needs expanding.
Maybe add a ref to RFC 6350 for vCards.

s2, para 1, 1st sentence: s/contents of which/allowable contents of which/
s2, para 1, 4th sentence: s/alidation/Validation/

s2, para 1, last sentence: s/the attributed defined schema/its 
characteristics as defined in the relevant schema/

s2, para 2: s/extend schema/ extend a schema/ [or "extend schemas"]

s2.1, para 1: s/For each attribute, SCIM schema/For each attribute, a 
SCIM schema/

s2.2:  The list of characteristics and their default values is not 
associated with the data type of the attribute but is another set of 
attributes of each attribute defined.  This would be clearer if the list 
of defaults and examples was separated out into a new section (probably 
after s2.2).  It would be helpful to point out explicitly that these 
defaults apply to all the attributes defined in the draft - I found the 
tacit assumption of default characteristics in later definitions of 
attributes had me asking myself whether certain characteristics ought to 
have been defined whereas they were actually covered by the defaults.

s2.2, 1st bullet: For consistency, s/required/REQUIRED/

s2.2, bullet 5:
OLD:
o  have no canonical values (e.g. type is "home" or "work"),
NEW:
o  have no canonical values (for example, the "type" sub-attribute in 
Section 2.3),
END

s2.2.6, Base 64 URL encoding:  Presumably the trailing padding 
characters can be omitted here - this should be mentioned whether or not 
they are needed.

s2.2.8: Presumably, in line with s2.3 and the JSON specification, the 
order of component attributes is not significant.  If this is so, it 
should be mentioned here:  Perhaps add:
     The order of the component attributes is not significant. Servers and
     clients MUST NOT require or expect attributes to be in
     any specific order when an object is either generated or analyzed.

s2.3, 1st para:  I found this difficult to parse.  Suggest:
OLD:
    Multi-valued attributes contain a list of value or may contain sub-
    attributes and MAY also be considered complex attributes.  The order
    of values returned by the server SHOULD NOT be guaranteed.  The sub-
    attributes below are considered normative and when specified SHOULD
    be used as defined.
NEW:
     Multi-valued attributes contain a list of elements, using the JSON 
array format
     defined in Section 5 of [RFC7159].  Elements can be either
     o   primitive values, or
     o   objects with a set of sub-attributes and values, using the JSON 
object format
          defined in Section 4 of [RFC7159], in which case they MAY also 
be considered
          to be complex attributes.  As with complex attributes, the 
order of sub-attributes
          is not significant.  The pre-defined sub-attributes listed in 
this section can be
         used with multi-valued attribute objects but these 
sub-attributes should only be used
        with the meanings as defined here.

s2.3: Question: Can sub-attributes have sub-sub-attributes?  I don't 
think I see any examples and maybe the definition in s1.2 effectively 
excludes them.  Might be worth being explicit.

s2.3, "primary" sub-attribute: Should this be specified as assumed to be 
"false" if not present in a relevant object?  I don't think this is 
covered by the defaults anywhere.

s2.3, $ref: I guess this ought always to be canonicalized  - this can be 
noted in the following paragraph where canonicalization is discussed.  
This would be a good place to specify a reference for URL 
canonicalization as mentioned above.

s2.3, last para: Suggest being a little more explicit about the scope of 
this paragraph.  I suggest:
OLD:
    Service providers MAY return the same value more than once with
    different types (e.g. the same e-mail address may used for work and
    home), but SHOULD NOT return the same (type, value) combination more
    than once per Attribute, as this complicates processing by the
    Consumer.
NEW:
    Service providers MAY return element objects with the same "value" 
sub-attribute
    more than once with a  different "type" sub-attribute (e.g., the 
same e-mail address
    may used for work and home), but SHOULD NOT return the same (type, 
value)
    combination more than once per Attribute, as this complicates 
processing by the
    consumer.
END
Note "Consumer" replaced by "consumer" - there is no definition of a 
specific meaning for this term.

s3, Resource Type:  s/("meta.resourceType")/("meta.resourceType", see 
Section 3.1)/

s3, Schemas Attribute:  I think s/the namespace of SCIM schema that 
defines/the namespaces of the SCIM schemas that define/; s/All 
representations of SCIM schema MUST include a non-zero value array/All 
representations of SCIM schemas MUST include a non-empty array/

s3, name used in example:  I don't know if the RFC Editor has a policy 
on suitable fictitious names equivalent to example.com for domains.  
Apparently Jane Roe and Mary Major have been used in US legal practice  
as female alternatives to the ubiquitous Mr John Doe.  Probably good to 
check with the RFC Editor.

s3.1, id, externalId, meta.version, meta.resourceType:  I suspect these 
ought to be caseExact?

s3.1, externalId:  The concepts of "provisioning domain" and a "client's 
tenant" need to be defined.  The externalId attribute is not explicitly 
defined as REQUIRED or OPTIONAL.

s3.1.1, meta.resource:  I got the impression from s3 that 
meta.resourceType was REQUIRED rather than being optional as noted in 
the first para of s3.1.1.

s3.1.1, meta.location: Should the value of this sub-attribute be the 
same as Content-Location rather than Location?  Is it intended that the 
request should be redirected (or that the resource was newly created?  
If not it seems Content-Location would be more appropriate.  A normative 
reference to the relevant HTTP RFC (probably RFC 7231) ought to be included.

s3.1.1, meta.version:  Would one expect a weak or strong ETag?  A 
normative reference to the relevant HTTP RFC (probably RFC 7232) ought 
to be included.

s3.2, last sentence: s/Section 6and/Section 6 and/ (missing space).

s3.3, 1st para:    s/used in LDAP/are used in LDAP/;
                             s/Each "schemas" value indicates additive 
schema/Each value in the "schemas" attribute  indicates an additive schema/;
                             s/See Figure 5 for an example JSON 
representation/See Figure 5 for an example of the JSON representation/

s3.3, para 2: s/"schemas" URI value/URI value in the "schemas" attribute/

s4.1.1, userName:  Having said that each User MUST have  include a 
non-empty userName value, why is this attribute RECOMMENDED rather than 
REQUIRED?  I guess it ought to be caseExact also.

s4.1.1, profileUrl: Needs a canonicalization mechanism specified.

s4.1.1, preferredLanguage:  There is potentially more than one preferred 
language (as per Accept-Languages) so this presumably this ought to be a 
Multi-valued attribute.  The Accept-Language header syntax also has an 
optional, per language, weight to assist with selection.  Should this be 
catered for here as well?  This would presumably mean that it should 
have sub-attributes (e.g.) using "value" for the name and "weight" or 
some such.   Also s/localized User interface/localized user interface/

s4.1.1, password:  I *hope* there is a discussion of the security 
implications of this field later.  A pointer to this discussion would be 
highly desirable.

s4.1.2, photos: A reference to the canonicalization mechanism is needed 
(see previous comment).

s4.1.2, entitlements, roles: There doesn't seem to be any good reason 
for capitalizing 'NO' here: s/NO/no/, 2 places.

s4.2, para 2: s/by the service provider are considered/by the service 
provider, and are considered/

s4.3, employeeNumber:  Maybe this might be better called an 
"employeeIdentifier" since it can be alphanumeric.  Is there any reason 
why this can't just be any old string?

s5, patch: A pointer to the SCIM protocol draft PATCH operation would be 
helpful.

s5, bulk: A pointer to the SCIM protocol draft Bulk operations section 
would be helpful.  I note that the capitalized form is not used in the 
protocol draft: suggest s/BULK/Bulk/ (total of 2 places)

s5, filter: A pointer to some appropriate part of the SCIM protocol 
draft  (maybe s3.4.2.2) would be helpful.

s6, endpoint:  (1)The endpoint is defined to be a relative URI.   It is 
therefore inappropriate that the example here is "/Users".  I guess it 
ought to be "Users".  There are a number of example of relative URIs 
starting with / in the examples in Section 8 that also ought to be 
corrected.

s6, endpoint: (2) Please bear with me, this is a bit long winded... I 
initially thought that the 'endpoint' mechanism was a possible 
contravention of BCP 190/RFC 7320:  Quoting s2.3 of RFC 7320:
>     Scheme definitions define the presence, format, and semantics of a
>     path component in URIs; all other specifications MUST NOT constrain,
>     or define the structure or the semantics for any path component.
>
>     ....
>
>     For example, an application ought not specify a fixed URI path
>     "/myapp", since this usurps the host's control of that space.
>
>     Specifying a fixed path relative to another (e.g., {whatever}/myapp)
>     is also bad practice (even if "whatever" is discovered as suggested
>     in Section 3); while doing so might prevent collisions, it does not
>     avoid the potential for operational difficulties (for example, an
>     implementation that prefers to use query processing instead, because
>     of implementation constraints).
In Section 6, the definition of the endpoint attribute specifies that 
each schema has to declare a relative URI or path component that gives 
access to schema instances.  My initial thinking was that  the endpoint 
value was standardized for Users and Groups in the draft.  My 
interpretation of s2.3 of RFC 7320 was that this technique is deprecated 
as bad practice.  After sleeping on it, I think I understand that the 
endpoint value is *not* standardized and potentially each service 
provider can use a different endpoint name if they really have to 
(although I guess in this case it would be good to go with the 
defaults.)  So I am happy that this isn't flagrantly contravening BCP 
190, although I am not sure about the query processing bit at the end of 
the quoted section.  Conclusion: I think it would be useful to add a 
note to the definition of endpoint to indicate that it is at the choice 
of the service delivering the resources and is not a fixed value, maybe 
saying that this is intended to avoid infringing BCP 190.

s7, mutability:
OLD:
mutability  A single keyword indicating what types of
                     modifications an attribute MAY accept as follows:
This 'MAY' is not about the 'protocol'.. Suggest:
NEW:
mutability  A single keyword indicating the circumstances under
                     which the value of the attribute can be (re)defined:
END

s9: s/personally identifiable information/personally identifying 
information/g

s9, 1st bullet: s/mulitple/multiple/

s9: para 1:  It would be sensible to also forbid the carrying of 
passwords in requests that are not encrypted.

s9:  It would be worth emphasizing that privacy issues should be 
considered whenever resource extensions are defined.

s10.1:  This is a request for a new entry in the  'URN Sub-namespace for 
Registered Protocol Parameter Identifiers' ...
OLD:
    IANA has created a registry for new IETF URN sub-namespaces,
    "urn:ietf:params:scim:", per [RFC3553].  The registration request is
    as follows:

    Per [RFC3553], IANA has registered a new URN sub-namespace,
    "urn:ietf:params:scim".
NEW:
    IANA is requested to add an entry to the 'IETF URN Sub-namespace for 
Registered Protocol Parameter Identifiers'
    registry and create a sub-namespace for the Registered Parameter 
Identifier as per [RFC3553]:
    "urn:ietf:params:scim:".
    The registration request is as follows:
END

s10.2:   This section is lacking a specification of exactly what is 
recorded in the new SCIM registry - the template tells how to apply and 
considerations to be used in granting the request.  See Section 8.4 of 
RFC 7035, for example, to see what is needed here.


s11.1: Needs a reference to the SCIM protocol document.

s11.2, [Olson-TZ] is  incomplete - I suspect it needs a reference to the 
IANA TZ database http://www.iana.org/time-zones

[Gen-art] Gen-art last call review of draft-ietf-… Elwyn Davies
Re: [Gen-art] Gen-art last call review of draft-i… Leif Johansson
Re: [Gen-art] Gen-art last call review of draft-i… Phil Hunt
Re: [Gen-art] Gen-art last call review of draft-i… Leif Johansson
Re: [Gen-art] Gen-art last call review of draft-i… Kelly Grizzle
Re: [Gen-art] Gen-art last call review of draft-i… Erik Wahlström neXus
Re: [Gen-art] Gen-art last call review of draft-i… Elwyn Davies
Re: [Gen-art] Gen-art last call review of draft-i… Phil Hunt
Re: [Gen-art] Gen-art last call review of draft-i… Morteza Ansari (moransar)
Re: [Gen-art] Gen-art last call review of draft-i… Elwyn Davies
Re: [Gen-art] Gen-art last call review of draft-i… Phil Hunt
Re: [Gen-art] Gen-art last call review of draft-i… Elwyn Davies
Re: [Gen-art] Gen-art last call review of draft-i… Phil Hunt