[weirds] Ted Lemon's Discuss on draft-ietf-weirds-rdap-query-16: (with DISCUSS and COMMENT)

"Ted Lemon" <ted.lemon@nominum.com> Wed, 29 October 2014 18:47 UTC

Return-Path: <ted.lemon@nominum.com>
X-Original-To: weirds@ietfa.amsl.com
Delivered-To: weirds@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0DE11A884D; Wed, 29 Oct 2014 11:47:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.9
X-Spam-Level:
X-Spam-Status: No, score=-3.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q2zbd7wVsirg; Wed, 29 Oct 2014 11:47:49 -0700 (PDT)
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C78E1A87A6; Wed, 29 Oct 2014 11:47:49 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Ted Lemon <ted.lemon@nominum.com>
To: The IESG <iesg@ietf.org>
X-Test-IDTracker: no
X-IETF-IDTracker: 5.7.1.p2
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <20141029184749.10576.92440.idtracker@ietfa.amsl.com>
Date: Wed, 29 Oct 2014 11:47:49 -0700
Archived-At: http://mailarchive.ietf.org/arch/msg/weirds/uFSrXBux5wDfSmWGpmq44Uak49o
Cc: weirds-chairs@tools.ietf.org, weirds@ietf.org, draft-ietf-weirds-rdap-query@tools.ietf.org
Subject: [weirds] Ted Lemon's Discuss on draft-ietf-weirds-rdap-query-16: (with DISCUSS and COMMENT)
X-BeenThere: weirds@ietf.org
X-Mailman-Version: 2.1.15
List-Id: "WHOIS-based Extensible Internet Registration Data Service \(WEIRDS\)" <weirds.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/weirds>, <mailto:weirds-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/weirds/>
List-Post: <mailto:weirds@ietf.org>
List-Help: <mailto:weirds-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/weirds>, <mailto:weirds-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:47:52 -0000

Ted Lemon has entered the following ballot position for
draft-ietf-weirds-rdap-query-16: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to http://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
http://datatracker.ietf.org/doc/draft-ietf-weirds-rdap-query/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Point 1:

In section 3.1.1, it's not clear which of three alternative things you
may have been specifying: (1) the ASCII representation of the IP address
is intended to be converted to binary before use, (2) the ASCII
representation is intended to be used directly, or (3) whether the IP
address is converted to binary prior to use is an implementation choice. 
 If (1) is what's intended, you should say so explicitly.   If either (2)
or (3) is intended, then the representation that's used needs to be in
some canonical form, and the text as written does not ensure that it will
be.

So to address this discuss, you should either update the text to say that
(1) is what's intended, and it won't work otherwise, or you need to
require, not suggest, a canonical representation.   If you go with the
second option, your reference to RFC 5952 isn't adequate: you need to
specifically reference section 4, and specifically exclude section 5.   I
have no preference as to how you resolve this.

Point 2:

In 3.1.3, the instructions as given don't really guide implementations
toward interoperability.   Why not just say that servers SHOULD either
convert a-labels to u-labels, or u-labels to a-labels, and be done with
it?   It's not like it's a difficult conversion, and they have to do the
conversion to do the comparison, unless for some reason they store both
A- and U-labels as keys in their database, which seems really unlikely.  
The recommendations are written are so vague that I would expect
interoperability to only be likely in the case of queries that contain no
IDN labels.   I wouldn't mind this in an Informational document, but it's
pretty sketchy in a standards-track document.

It's particularly weird that in one place you say "An RDAP server that
receives a query string with a mixture of A-labels and U-labels MAY
convert all the U-labels to A-labels, perform IDNA processing, and
proceed with exact-match lookup" and then in the next paragraph you say
"The server MAY perform the match using either the A-label or U-label
form.  Using one consistent form for matching every label is likely to be
more reliable."

I'm not sure how to resolve this.   I think there's some underlying
decision the working group has made that led to this language, but the
lack of consistency I mention above leads me to wonder if either that
decision wasn't very clear, or the document failed to reflect it.  
Interestingly, 3.1.4 is written as if 3.1.3 gives clear guidance as to
how IDNs are represented.

In 3.2.1 and 3.2.2, the main use cases I can see for nameserver searches
are for spammers to identify related domains, and for eavesdroppers to do
the same.   What's the motivation for including these capabilities?   At
a minimum, the privacy considerations for this search ought to be
discussed in a privacy considerations section or in the security
considerations section.

Point 3:

In 4.1, you don't actually specify the syntax for partial string matches.
  I think I can intuit what you mean, but you should be explicit, and you
shouldn't suggest using POSIX regex searches unless that's what you
really mean; if that's what you really mean, then you need to do a lot
more work than you've done.   I _think_ what you intend is to simply say
that if one or more asterisks appear in a label, then when the search is
done, any label that contains the non-asterisk characters in sequence
plus zero or more characters in sequence in place of each asterisk would
match.   But you don't actually say that, so I'm not sure what you
actually intended.   The text could be read to mean that any arbitrary
search mechanism is allowed, but if that's the case then you need to go
into a lot more detail about the caveats, such as the '.' character in
posix regexps and the start and end markers.

The text on partial matches for unicode also seems unnecessarily
complicated, and likely not implementable without massive knowledge of
all the various unicode character sets, which I would not expect an
implementation to bother with.   It seems relatively harmless to allow
searches on combining characters, particularly since it would be a lot of
work to prevent it.   For example, a search for *ཀ་ would not find all
instances of a syllable that starts with the sound "ka" because ka can be
either a head letter or subjoined, but a user might search first using
the head letter and then the subjoined letter if (as is not unlikely) he
or she were unsure of the correct spelling.   It would be a shame if such
a search were rendered impossible by an overly-thorough implementation of
the current text.

As I say, I don't really know what was intended here, so I can't make a
definite suggestion as to what the right thing to do is to fix this; I
think we just need to discuss it, hence the discuss point.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Point A:

In 3.1.2:

   For example, the following URL would be used to find information
   describing autonomous system number 12 (a number within a range of
   registered blocks):

   http://example.com/rdap/autnum/12

   The following URL would be used to find information describing 4-byte
   autonomous system number 65538:

   http://example.com/rdap/autnum/65538

The examples don't really seem to illustrate what is said in the text.  
The syntax for both examples is the same, and we don't see any return
results.   So to say that one example is an example of a number within a
range of registered blocks, while the other is a 4-byte number, doesn't
make sense because the query formats are identical.   The preceding text
is clear, if I understand it correctly: a number in the path element
following 'autnum' is an autonomous system number, which references a
block of one or more numbers.   If there are additional semantics, e.g.,
byte size boundaries for AS numbers, that are also encoded in the query,
those should be described explicitly.   But IMHO that doesn't make sense:
in _this_ context, these are just decimal numbers of arbitrary precision,
and how many bytes they are, or whether any particular number references
a block or just a single number, is not something that should be
mentioned in the examples.

Point B:

Section 6 appears to be confusing presentation and representation.   A
client might well present a UI that shows U-labels, but use A-labels on
the back end.   The text as written here really doesn't make sense.  
What are you trying to say?   I think that you shouldn't confuse what's
presented to the user with what's sent on the wire.   Possibly this is
related to the confusion over A-labels and U-labels that I called out in
Point 2 of the discuss.