Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 02 October 2020 02:20 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: regext@ietfa.amsl.com
Delivered-To: regext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D2AC73A0AE3; Thu, 1 Oct 2020 19:20:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.103
X-Spam-Level: ***
X-Spam-Status: No, score=3.103 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s7dWmEVzyBOu; Thu, 1 Oct 2020 19:20:32 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DDD2A3A0AE8; Thu, 1 Oct 2020 19:20:31 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 0922KL5Y009241 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 1 Oct 2020 22:20:23 -0400
Date: Thu, 01 Oct 2020 19:20:20 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: Mario Loffredo <mario.loffredo@iit.cnr.it>
Cc: regext-chairs@ietf.org, Tom Harrison <tomh@apnic.net>, The IESG <iesg@ietf.org>, draft-ietf-regext-rdap-sorting-and-paging@ietf.org, regext@ietf.org
Message-ID: <20201002022020.GR89563@kduck.mit.edu>
References: <160089722480.18312.1611285341459635513@ietfa.amsl.com> <78b84143-eecd-ea03-a4db-077dd9920dc4@iit.cnr.it> <20200929001425.GN89563@kduck.mit.edu> <c76db7f1-8c88-142b-5242-49e0a62e9b38@iit.cnr.it>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <c76db7f1-8c88-142b-5242-49e0a62e9b38@iit.cnr.it>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/regext/7ig7YcWunn3wIlPVyKMoNoReWZo>
Subject: Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)
X-BeenThere: regext@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Registration Protocols Extensions <regext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/regext>, <mailto:regext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/regext/>
List-Post: <mailto:regext@ietf.org>
List-Help: <mailto:regext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/regext>, <mailto:regext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Oct 2020 02:20:36 -0000

Hi Mario,

Not a whole lot left to say, but what there is is inline.

On Wed, Sep 30, 2020 at 01:28:53PM +0200, Mario Loffredo wrote:
> Hi Ben,
> 
> thanks a loto for uor quick reply to my responses. My comments are inline.
> 
> Il 29/09/2020 02:14, Benjamin Kaduk ha scritto:
> > Hi Mario,
> >
> > Also inline.
> >
> > On Sat, Sep 26, 2020 at 03:56:03PM +0200, Mario Loffredo wrote:
> >> Hi Benjamin,
> >>
> >> thanks a lot for your extensive review. I apologize for the delay in
> >> replying but I have been very busy the last two days and your feedback
> >> is very detailed.
> >>
> >> Please find my coments inline.
> >>
> >> Il 23/09/2020 23:40, Benjamin Kaduk via Datatracker ha scritto:
> >>> Benjamin Kaduk has entered the following ballot position for
> >>> draft-ietf-regext-rdap-sorting-and-paging-17: Discuss
> >>>
> >>> When responding, please keep the subject line intact and reply to all
> >>> email addresses included in the To and CC lines. (Feel free to cut this
> >>> introductory paragraph, however.)
> >>>
> >>>
> >>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> >>> for more information about IESG DISCUSS and COMMENT positions.
> >>>
> >>>
> >>> The document, along with other ballot positions, can be found here:
> >>> https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-sorting-and-paging/
> >>>
> >>>
> >>>
> >>> ----------------------------------------------------------------------
> >>> DISCUSS:
> >>> ----------------------------------------------------------------------
> >>>
> >>> Should we say something about which order the sorting criteria are
> >>> applied (first to last vs last to first) when multiple sortItems are
> >>> specified in a query?
> >> [ML] The common interpretation is from left to right so I don't think we
> >> need to clarify this concept.
> > I think I can accept not saying more on this subject, but I am curious:
> > when you say left to right, that means that the leftmost parameter is
> > higher priority?  So that, to give a totally contrived example, if I had
> > pairs of (name, id), a query with &sort=name;&sort=id; would give:
> >
> > ("alpha", 10)
> > ("alpha", 20)
> > ("beta", 10)
> > ("beta", 20)
> 
> [ML]  Exactly.
> 
> One minor comments: the right notation would be &sort=name,id

This is what I get for writing email without consulting the document;
thanks for spotting it.

> >
> >>> I recognize that in the HATEOS model, the actual JSONPaths reported by
> >>> the server should be used by the client to determine what a given sort
> >>> property does, but it also seems like it would be confusing for this
> >>> document to specify (e.g.) an "email" property with specific JSONPath,
> >>> and then have a server go off and use "email" to mean something else,
> >>> even if that is just the addition of "pref" as discussed at the end of
> >>> Section 2.3.1.  Do we want to try to have the properties defined by this
> >>> document be universally defined and encourage the use of new/different
> >>> property names for variations on them?  (The answer may well be "no",
> >>> but the answer is not intuitively clear to me.)  To put it another way,
> >>> is the list in Section 2.3.1 normative, or just an example?
> >> [ML] I would say "normative" just to facilitate interoperability and
> >> avoid ambiguities. Maybe it could be enough to say that the sorting
> >> properties deifined in the document are considered reserved so an RDAP
> >> server MUST not map them onto other RDAP response values.
> >>
> >> Does it work for you?
> > Yes, that would work for me.  Thanks!
> [ML] Perfect.
> >
> >>> ----------------------------------------------------------------------
> >>> COMMENT:
> >>> ----------------------------------------------------------------------
> >>>
> >>> Section 1
> >>>
> >>>      However, there are some drawbacks associated with the use of the HTTP
> >>>      header.  First, the header properties cannot be set directly from a
> >>>      web browser.  Moreover, in an HTTP session, the information on the
> >>>      status (i.e. the session identifier) is usually inserted in the
> >>>      header or a cookie, while the information on the resource
> >>>      identification or the search type is included in the query string.
> >>>      The second approach is therefore not compliant with the HTTP standard
> >>>      [RFC7230].  As a result, this document describes a specification
> >>>      based on the use of query parameters.
> >>>
> >>> A few more words (section number from 7230?) on why the second approach
> >>> is not compliant with HTTP might help the reader, though it isn't
> >>> stricly necessary (we're not using it, after all).
> >> [ML] Could it be better to replace RFC7230 with RFC7231 and put a
> >> refernce to Section 8.3.1
> >> (https://tools.ietf.org/html/rfc7231#section-8.3.1) ?
> > It might be, but I think I still don't understand why using the HTTP header
> > field for sorting and paging information is not compliant with HTTP -- the
> > linked section says that header fields can be used to communicate
> > information about the target resource, which IIUC includes the resource as
> > qualified by the query string.  (But note that I am not an HTTP expert...)
> 
> [ML]  Would it be more appropriate to update the sentence as in the 
> following?
> 
> OLD
> 
> The second approach is therefore not compliant with the HTTP standard 
> [RFC7230]
> 
> NEW
> 
> The second approach is therefore not compliant with the most common 
> practices about the usage of the HTTP headers [RFC7231]

That's better, though it does not help the reader find out what those
"common practices" are.  (It suffices to justify the choice, though, so I
won't press the point further.)

> >>> Section 2.1
> >>>
> >>>         *  "jsonPath": "String" (OPTIONAL) the JSONPath of the RDAP field
> >>>            corresponding to the property;
> >>>
> >>> What is this path relative to?  (Does the client have to know from the
> >>> other context what type of object it refers to?)
> >> [ML]  All the JSONPath expressions defined in the document are relative
> >> to the root of an RDAP response.  The sorting_metadata object is
> >> included in the same response so I think that the context is clear and
> >> no further clarification is needed.
> > Now that you mention it, I do recall other discussion of paths being
> > relative to the root; my apologies for the noise.
> [ML] You are welcome.
> >
> >>>         *  "links": "Link[]" (OPTIONAL) an array of links as described in
> >>>            [RFC8288] containing the query string that applies the sort
> >>>            criterion.
> >>>
> >>> Just to check: this is going to have the same structure for a Link
> >>> object that draft-ietf-regext-rdap-partial-response does?  (I am not
> >>> coming up with a great way to deduplicate the definitions, off the top
> >>> of my head.)
> >> [ML] Yes. The sorting links have the same structure as the subsetting
> >> links (see Section 2.3.2.).
> >>>      o  "pageSize": "Numeric" (OPTIONAL) a numeric value representing the
> >>>         number of objects returned in the current page.  It MUST be
> >>>         provided if and only if the total number of objects exceeds the
> >>>         page size.  This property is redundant for RDAP clients because
> >>>         the page size can be derived from the length of the search results
> >>>         array but, it can be helpful if the end user interacts with the
> >>>         server through a web browser;
> >>>
> >>> If it's redundant, we should probably say something about error handling
> >>> for when the things that are supposed to be identical have different
> >>> values.
> >> [ML]  I think this situation is very unlinkely.  Anyway, in this case,
> >> the length of the results array really counts. Obviously, it is a bit
> >> more likely that the totalCount value might be different from the sum
> >> of  the number of results in each page. In fact, even if the
> >> registration data can't be considered real-time data, it might happen
> >> that the count parameter is present in the initial query and it might
> >> take time to scroll the result set, so there could be a small likelihood
> >> that the initial totalCount value is obsolete because the result set is
> >> changed in the meantime. Also in this case, the sum of each result array
> >> length really counts.
> >>
> >> Should I write something about?
> > Thanks for the additional explanation.  I think that if we were to write
> > anything more, it would just be a few words in the description here to
> > indicate that it's just for convenience, e.g., "representing the number of
> > objects that should have been returned in the current page".  But it is
> > probably okay to leave it unchanged, too.
> [ML] Changed.
> >>> Section 2.3
> >>>
> >>>      Except for sorting IP addresses, servers MUST implement sorting
> >>>      according to the JSON value type of the RDAP field the sorting
> >>>      property refers to.  That is, JSON strings MUST be sorted
> >>>      lexicographically and JSON numbers MUST be sorted numerically.  If IP
> >>>      addresses are represented as JSON strings, they MUST be sorted based
> >>>      on their numeric conversion.
> >>>
> >>> There are more JSON types than string and number; are those other types
> >>> garanteed to not appear in sortable RDAP fields?  (I can't see how such
> >>> a guarantee could be made, given that servers can define their own
> >>> sorting properties.)
> >> [ML] The other primitive JSON type remaining is boolean but I don't
> >> think it makes sense to sort by a boolean property. Instead, I missed
> > I think I also had JSON maps in mind, but I guess it is not exactly defined
> > to sort by a map itself, only a primitive type, so my question was a bit
> > silly.
> [ML] Yes. Only values with primitive types.
> >> that those values denoting dates and times MUST be sorted in
> >> chronological order even if they are strings. I'll update the sentence
> >> as in the following:
> >>
> >> Except for sorting IP addresses and values denoting dates and times, servers MUST implement sorting
> >>      according to the JSON value type of the RDAP field the sorting
> >>      property refers to.  That is, JSON strings MUST be sorted
> >>      lexicographically and JSON numbers MUST be sorted numerically.
> >>      Values denoting dates and times MUST be sorted in chronological order.  If IP
> >>      addresses are represented as JSON strings, they MUST be sorted based
> >>      on their numeric conversion.
> >>
> >> Does it work for you?
> > I think so; thanks.
> [ML] Good.
> >
> >>>      If the "sort" parameter reports an allowed sorting property, it MUST
> >>>      be provided in the "currentSort" field of the "sorting_metadata"
> >>>      element.
> >>>
> >>> nit: is "reports" the best word to describe this behavior (which, IIUC,
> >>> is "present in the query component of the request URL"?
> >> [ML] Sounds better.
> >>> Section 2.3.1
> >>>
> >>>      In the "sort" parameter ABNF syntax, property-ref represents a
> >>>      reference to a property of an RDAP object.  Such a reference could be
> >>>      expressed by using a JSONPath.  The JSONPath in a JSON document
> >>>
> >>> nit: is there a missing word here ("a JSONPath expression")?
> >> [ML] Just for coinciseness, may I use "jsonpath" to mean "JSONPath
> >> expression" and keep "JSONPath" to refer to the specification?
> >>
> >> I could write something like: "JSONPath expression (named "jsonpath" in
> >> the following)"
> > That's fine from my perspective, sure.
> [ML] Perfect.
> >
> >>>      o  Note that some of the object specific properties are also defined
> >>>         as query paths.  The object specific properties include:
> >>>
> >>> nit: the list structure in this item does not seem parallel to the
> >>> structure of the first item.
> >> [ML] OK. I'll change the sentence as in the following:
> >>
> >> Object specific properties.  Note that some of these properties
> >>         are also defined as query paths.  These properties include:
> >>
> >>>         as two representations of the same value.  By default, the
> >>>         unicodeName value MUST be used while sorting.  When the
> >>>         unicodeName is unavailable, the value of the ldhName MUST be used
> >>>         instead;
> >>>
> >>> I'm not entirely sure how much value "by default" adds here.  Would the
> >>> meaning be different if we said "The unicodeName value MUST be used
> >>> while sorting if it is present; when the unicodeName is unavailable, the
> >>> value of the ldhName is used instead"?
> >> [ML] No, it wouldn't. I'll change the sentence as you suggest.
> >>>      o  The jCard "sort-as" parameter MUST be ignored for the sorting
> >>>         capability described in this document;
> >>>
> >>> It's a little bit of a juxtaposition to refer to jCard here in the prose
> >>> but vcard in the table.
> >> [ML] I would keep it as is. Instead, I would replace all the "vcard"
> >> occurrences with "jCard". Being jCard a transliteration of vCard in
> >> JSON, it seems appropriate to me to keep the references to RFC6350
> >> sections and  to use the corresponding jCard elements for the mapping
> >> between the sorting properties and the RDAP response elements. Besides,
> >> I would write a sentence about the fact that jCard is the JSON format of
> >> vCard, add a link to RFC7095 and insert RFC7095 among the Normative
> >> References.
> >>
> >> Do you agree?
> > Yes, thanks.
> [ML] OK.
> >
> >>>      o  Even if a nameserver can have multiple IPv4 and IPv6 addresses,
> >>>         the most common configuration includes one address for each IP
> >>>         version.  Therefore, the assumption of having a single IPv4 and/or
> >>>         IPv6 value for a nameserver cannot be considered too stringent.
> >>>
> >>> I disagree with the flat assertion that it "cannot be considered too
> >>> stringent".  It can be so considered, as a matter of difference of
> >>> opinion; what is appropriate to do here is to say that this
> >>> document/protocol makes the assumption (especially since we go on to
> >>> describe the exception-handling procedure when the assumption is
> >>> violated).
> >> [ML] May I udpate that sentence as in the following?
> >>
> >> OLD
> >>
> >> Therefore, the assumption of having a single IPv4 and/or
> >>         IPv6 value for a nameserver cannot be considered too stringent.
> >>
> >> NEW
> >>
> >> Therefore, this specification makes the assumption that nameservers have a single IPv4 and/or
> >>         IPv6 value.
> > Yes, please!
> [ML] Done.
> >
> >>>      o  Multiple events with a given action on an object might be
> >>>         returned.  If this occurs, sorting MUST be applied to the most
> >>>         recent event;
> >>>
> >>> This makes a lot of sense as the default and I don't propose changing it
> >>> now, but I do wonder how hard it would be to add support later for
> >>> sorting on (say) the oldest event instead.
> >> [ML] Well, I wrote that sentence because some RDAP events can appear
> >> multiple times. For example, a domain might be locked-unlocked
> >> repeatedly. The purpose of that sentence is just to avoid ambiguities
> >> and implicitly suggest RDAP providers to arrange events with the same
> >> type in descending chronological order.
> >>>      The "jsonPath" field in the "sorting_metadata" element is used to
> >>>      clarify the RDAP field the sorting property refers to.  The mapping
> >>>      between the sorting properties and the JSONPaths of the RDAP fields
> >>>      is shown below:
> >>>      [...]
> >>>         name
> >>>
> >>>            $.domainSearchResults[*].unicodeName
> >>>
> >>> This seems to ignore the subtlety regarding unicodeName vs ldhName.  Is
> >>> there a way it could be expressed in JSONPath?
> >> [ML] If unicodeName and ldhName were alternative, the JSONPath union
> >> operator would fit (i.e.
> >> $.domainSearchResults[*].[unicodeName,ldhName]). Currently, RFC7483
> >> contains no assumption about when they should/must be present but
> >> examples seem to recommend to present unicodeName only for IDNs. When
> >> both the properties are present, the union operator doesn't fit exactly
> >> and I haven't still found the right JSONPath expression based only on
> >> the basic operators. However, since the "jsonPath" member is only for
> >> documentation, the aforesaid JSONPath expression could be the most
> >> suitable for conveying that sorting is applied on a kind of
> >> <unicodeName, ldhName> combination.
> > I have to defer to your expertise here; thank you for thinking about it.
> [ML] Thanks. Maybe this is the only case where the JSONPath WG outcomes 
> might be helpful :-)
> >
> >>>      o  Nameserver
> >>>
> >>>         name
> >>>
> >>>            $.domainSearchResults[*].unicodeName
> >>>
> >>> Presumably this is supposed to be nameserverSearchResults?
> >> [ML] Absolutely. It's a cut-and-paste typo :-)
> >>> Section 2.4
> >>>
> >>> I think we want another introductory paragraph like:
> >>>
> >>> % The cursor parameter is used by the server to preserve information
> >>> % about the pagination state of a given query's results across calls to
> >>> % the search API, so that successive requests by the client can return
> >>> % page N, N+1, N+2, etc.  Its value is only required to be interpretable
> >>> % by the server and could be implemented, for example, as an opaque
> >>> % database lookup key.  If a server does use a method for generating
> >>> % cursor values that involves internal structure, such as the one
> >>> % described below, the server needs to recognize that the value supplied
> >>> % by a client could have been modified (maliciously), and implement
> >>> % appropriate bounds-checking and similar measures when parsing received
> >>> % values.
> >>>
> >>> The current wording strongly suggests that base64-encoding a meaningful
> >>> value that the client could inspect or even construct is required, and I
> >>> do not think that is very maintainable or what was intended, given the
> >>> current second paragraph ("servers can change the method over time
> >>> without announcing anything to clients").
> >>>
> >>> (side note) I'm also pretty partial to the way JMAP discusses returning
> >>> (paginated, but non-uniformly) changes to a given data stream, e.g., at
> >>> https://www.rfc-editor.org/rfc/rfc8620.html#section-5.2 -- any given
> >>> state is named, and you can get "stuff starting at <named state>" and
> >>> the name to use for the state as of the current reply.
> >> [ML] Maybe I didn't make myself clear.
> >>
> >> The Base64 encoding is a simple (unrecommended) trasformation to make
> >> the cursor value opaque to the client. It just seemed suitable to me for
> >> being used in some examples.But if you take a loook at the example of
> >> Figure 6, you may note that you can't obtain a meaningful result by
> >> simply Base64-decoding the cursor value. Definitively, the method to
> >> encrypt the cursor value must be more complex than a mere Base64 encoding.
> >>
> >> Regarding the sentence between brackets, it means that servers can
> >> change the underlying pagination strategy without having an impact on
> >> clients. A server can initially implement the offset pagination and then
> >> turns to the keyset pagination but this has no effect on clients' features.
> >>
> >> The same concepts about the checks that servers should make in order to
> >> check the cursor value are reported both in the "Negative Answer"
> >> section and in Appedix C.3. "Paging"
> >>
> >> Anyway, I'll try to integrate your text in the current document and add
> >> a sentence with the purpose of discouraging the use of the
> >> Base64-encoding in the cursor implementations.
> > Thank you; I did not know that you wanted to discourage the use of plain
> > base64 encoding, but that is reassuring to know.
> > (I did notice that the example in Figure 6 did not decode to a meaningful
> > result, but did not make much of a conclusion from that.)
> [ML] OK. I will add some text to clarify that a mere Base64 encoding is 
> not recommended to encrypt the cursor value.

Okay, thank you.

> >
> >>> Section 4
> >>>
> >>> If the server doesn't have access to an efficient (e.g.) counting
> >>> operation on the backend, would we recommend that the server not support
> >>> sorting/pagination, since there's not much benefit from having the
> >>> server pull up all the results and count them just to be able to return
> >>> the total count value back to the client, and then go do the same work again
> >>> when the client asks for the next page of results?
> >> [ML] In my implementation the RDAP server doesn't present the count
> >> operator in the sorting and paging links. The number of results doesn't
> >> change at all if the result set is sorted by a property rather than
> >> another. The same generally occurs (as I wrote above) if the client is
> >> scrolling the result set pages. So why to repeat the count parameter in
> >> the links? The totalCount value is returned in the response to the
> >> initial query and, as It is no more repeated in the links,  the counting
> >> operation is not executed. Therefore, we don't need to make particular
> >> assumptions about the performance of counting operation.
> >>> Section 7
> >>>
> >>> I suggest noting that (encoded) structured "cursor" values present a new
> >>> attack surface on the server that needs to be protected.
> >> [ML] Sorry, could you futherly explain this concept? AFAIK, it is
> >> possible to protect REST API endpoints but not query parameters.
> > I think this was assuming that the server was going to just base64-encode
> > something like "offset=100,limit=50" -- in that case a client could pass in
> > the base64'd version of "offset=1000000000,limit=50".  The server would
> > need to sanity-check the results of base64 decoding and reject the
> > too-large offset.  If the server is expecting to do a fancier
> > self-encrypted-token scheme for the cursor, the integrity check associated
> > with the encryption takes care of this protection inherently, and we may
> > not need to mention anything about sanitizing these valuess..
> [ML] OK.
> >
> >>>      results in a response.  However, this last security policy can result
> >>>      in a higher inefficiency if the RDAP server does not provide any
> >>>      functionality to return the truncated results.
> >>>
> >>> I'm not sure I understand (or agree with) this last sentence -- it seems
> >>> that unlateral silent truncation of results by the server leads to not
> >>> just inefficiency but also potential security considerations in its own
> >>> right, with the client not knowing that it has incomplete results.
> >>> Also, if the server is truncating the results, by definition it "has
> >>> functionality to return the truncated results" -- that's what it's
> >>> doing!  So I assume the intent was to say something about negotiating or
> >>> indicating that the results are truncated, not actually doing the
> >>> truncation.
> >> [ML] I think that servers legitimately truncate the result sets to
> >> mitigate the risk of resource exhaustion and consequent denial of
> >> service. The implementation of the capablities described in this
> >> document makes servers to keep on managing sustainable result sets and,
> >> at the same time, increases clients'possibility to avoid truncation and
> >> find relevant results.
> > I agree with the paragraph you just wrote.  However, I think that the state
> > of affairs prior to this document, with unilateral truncation by the
> > server, can lead not just to "inefficiency" but also to security risks.  So
> > I was hoping to see something like "can result in higher inefficiency or
> > risk due to acting on incomplete information".
> [ML] OK. I will change the sentence as you suggest.
> > My second point ("Also, if the server [...]") was intending to suggest that
> > the last sentence say something like "if the RDAP Server does not provide
> > any functionality to return sorted results or iterate through the full
> > result set".
> 
> [ML] You are right. I mispelled the sentence. Is it fine for you if I 
> change the sentence as in the follwing?
> 
> OLD
> 
> if the RDAP server does not provide any
>      functionality to return the truncated results
> 
> NEW
> 
> if the RDAP server does not provide any
>      functionality to return results removed by truncation

Yes, perfect!

> >
> >>>      The new parameters presented in this document provide RDAP operators
> >>>      with a way to implement a server that reduces inefficiency risks.
> >>>
> >>> [same question about "inefficiency" being the right word]
> >> [ML] Maybe I can replace the phrase "that reduces inefficiency risks."
> >> with the phrase "that reduces the risk of resource exhaustion and
> >> consequent denial of service".
> >>
> >> Are you ok with it?
> > Denial of service is only one of the risks I have in mind; another is that
> > if a server silently truncates, a client will have incomplete data and
> > might derive a conclusion ("domain X does not satisfy property Y") that is
> > fal
> [ML] OK. As I wrote above, I'll change the sentence to outline the risk 
> due to acting on incomplete information.
> > se.
> >
> >>> Appendix B
> >>>
> >>>      o  It does not allow direct navigation to arbitrary pages because the
> >>>         result set must be scrolled in sequential order starting from the
> >>>         initial page;
> >>>
> >>> (side note) I didn't follow the references, so maybe this was covered
> >>> there, but I don't quite follow why direct navigation is impossible.  If
> >>> you use a key field for seeking, can't you just start in the middle from
> >>> some known value for that key field?
> >> [ML]  Especially when you know the total counf of a result set,  you can
> >> directly jump to a specific point in the result set through offset
> >> pagination but you can't do the same through keyset pagination because
> >> you don't know the key value at that point in advance. One can wonder:
> >> what jumping in the result set is use for? Well, for example, if you are
> >> looking for a specific item in a ordered collection of items, you could
> >> find it through the quicksort algorithm.
> > I don't want to press this topic very much, so let me just try a brief
> > example.  Suppose I have a sorted set of ASCII strings, and I want to see if the
> > string "koala" is present.  I could start at the beginning and look at each
> > one in turn until I get to something that sorts after "koala", or I could
> > ask the database "give me the first thing you have that is after "k", which
> > gets me some of the say there.  Is the problem that you need an actual
> > value in the dataset to start from, and since "k" isn't guaranteed to be in
> > the set you are forced to start from the beginning?
> 
> [ML] Offset pagination is based on the positions of the results within 
> the result set while keyset pagination is based on a unique combination 
> of values of the results. You can always skip the first K results of a 
> result set by specifying offset=K but you can't do the same through 
> keyset pagination because you don't know what is the combination of 
> values placed at position K.
> 
> Let me give you an example that can clarify.
> 
> Let's suppose that the query "k*" returns N=1000 results and the length 
> of result page is 100. The fastest method to find if "koala" is present 
> is to jump to the middle (i.e. offset=500) of the result set and look 
> the first result. If it is lexicographically lower than "koala" then 
> "koala" might be in second half, on the contrary, "koala" might be in 
> the first half. Let's suppose that the first item is greater than 
> "koala", then you can jump to the middle of the first half (offset=250) 
> and apply the above dichotomy in turn until you can find if "koala" is 
> present or not. This process takes log2(N)  steps maximum.
> 
> You can't do the same through keyset pagination because you don't know 
> how the values are distributed in the result set. You can only scroll 
> the result set from the first page to the last and this process takes N 
> steps maximum.
> 
> However, in general, one is interested in all the results (or in a 
> subset) returned by a query rather than a single result. In this case, 
> provided that the results can always be sorted according to a unique 
> index, keyset pagination is more efficient than offset pagination. Let's 
> take the afore example. By offset pagination, the underlying DBMS 
> selects always 1000 results and then returns the current page so this 
> means that, in the worst case, the DBMS will select 1000 results for 10 
> times (from offset=0 to offset=900). By keyset pagination the number of 
> results the underlying DBMS selects decreases by 100 results each step. 
> At the beginning, it selects 1000 results and returns the first page, 
> then it selects 900 results and returns the second page.
> 
> Note also two facts:
> 
> - the time needed to scroll the result set could be significant when the 
> result set is huge
> 
> - for all the RDAP searchable objects, it is always possible to build 
> more or less easily a combination of properties acting as unique index 
> and this is true regardless of whether the search includes the sort 
> parameter or not. For example, for the entity object class "handle" is a 
> unique property but if the search includes "sort=registrationDate", the 
> combination <registrationDate, handle> is unique.

Thank you for the additional explanation; I don't think we need to spend
more time on this topic.

I'm looking forward to the -18!

Thanks again,

Ben

> >>> Appendix C.2
> >>>
> >>>      total count.  Therefore, as "totalCount" is an optional response
> >>>      information, fetching always the total number of rows has been
> >>>
> >>> I'm not entirely sure in what sense "optional response information" is
> >>> intended -- my reading of Section 2.1 is that it's mandatory to return
> >>> totalCount if the client included the 'count' query parameter.
> >> [ML] Exactly but it isn't returned always. For this reason, it is an
> >> optional member of the paging_metadata object.
> > Okay.  (I think your reply to another ballot comment also helped clarify
> > this for me.)
> [ML] Good.
> >> Looking forward for your reply to my questions/comments.
> > Thanks a lot for the explanations and updates; hopefully I have clarified
> > anything that was unclear.
> >
> > -Ben
> >
> > _______________________________________________
> > regext mailing list
> > regext@ietf.org
> > https://www.ietf.org/mailman/listinfo/regext
> 
> -- 
> Dr. Mario Loffredo
> Systems and Technological Development Unit
> Institute of Informatics and Telematics (IIT)
> National Research Council (CNR)
> via G. Moruzzi 1, I-56124 PISA, Italy
> Phone: +39.0503153497
> Mobile: +39.3462122240
> Web: http://www.iit.cnr.it/mario.loffredo
>