Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)

Mario Loffredo <mario.loffredo@iit.cnr.it> Wed, 30 September 2020 11:32 UTC

Return-Path: <mario.loffredo@iit.cnr.it>
X-Original-To: regext@ietfa.amsl.com
Delivered-To: regext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 701D33A0C55; Wed, 30 Sep 2020 04:32:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.889
X-Spam-Level: **
X-Spam-Status: No, score=2.889 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.213, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ojL9IYV90j5j; Wed, 30 Sep 2020 04:32:26 -0700 (PDT)
Received: from smtp.iit.cnr.it (mx3.iit.cnr.it [146.48.98.150]) by ietfa.amsl.com (Postfix) with ESMTP id 1DABA3A0C50; Wed, 30 Sep 2020 04:32:24 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.iit.cnr.it (Postfix) with ESMTP id 18F176012B2; Wed, 30 Sep 2020 13:32:23 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mx3.iit.cnr.it
Received: from smtp.iit.cnr.it ([127.0.0.1]) by localhost (mx3.iit.cnr.it [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RY9KXM2CWfcV; Wed, 30 Sep 2020 13:32:15 +0200 (CEST)
Received: from [192.12.193.108] (pc-loffredo.nic.it [192.12.193.108]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by smtp.iit.cnr.it (Postfix) with ESMTPSA id 2EF18600352; Wed, 30 Sep 2020 13:32:15 +0200 (CEST)
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: regext-chairs@ietf.org, Tom Harrison <tomh@apnic.net>, The IESG <iesg@ietf.org>, draft-ietf-regext-rdap-sorting-and-paging@ietf.org, regext@ietf.org
References: <160089722480.18312.1611285341459635513@ietfa.amsl.com> <78b84143-eecd-ea03-a4db-077dd9920dc4@iit.cnr.it> <20200929001425.GN89563@kduck.mit.edu>
From: Mario Loffredo <mario.loffredo@iit.cnr.it>
Message-ID: <c76db7f1-8c88-142b-5242-49e0a62e9b38@iit.cnr.it>
Date: Wed, 30 Sep 2020 13:28:53 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <20200929001425.GN89563@kduck.mit.edu>
Content-Type: multipart/alternative; boundary="------------AF5AD449B9847D2F9E4973B2"
Content-Language: it
Archived-At: <https://mailarchive.ietf.org/arch/msg/regext/iYY4gN5ZrDJLzGJz0JJgEYDqjhQ>
Subject: Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)
X-BeenThere: regext@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Registration Protocols Extensions <regext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/regext>, <mailto:regext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/regext/>
List-Post: <mailto:regext@ietf.org>
List-Help: <mailto:regext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/regext>, <mailto:regext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Sep 2020 11:32:32 -0000

Hi Ben,

thanks a loto for uor quick reply to my responses. My comments are inline.

Il 29/09/2020 02:14, Benjamin Kaduk ha scritto:
> Hi Mario,
>
> Also inline.
>
> On Sat, Sep 26, 2020 at 03:56:03PM +0200, Mario Loffredo wrote:
>> Hi Benjamin,
>>
>> thanks a lot for your extensive review. I apologize for the delay in
>> replying but I have been very busy the last two days and your feedback
>> is very detailed.
>>
>> Please find my coments inline.
>>
>> Il 23/09/2020 23:40, Benjamin Kaduk via Datatracker ha scritto:
>>> Benjamin Kaduk has entered the following ballot position for
>>> draft-ietf-regext-rdap-sorting-and-paging-17: Discuss
>>>
>>> When responding, please keep the subject line intact and reply to all
>>> email addresses included in the To and CC lines. (Feel free to cut this
>>> introductory paragraph, however.)
>>>
>>>
>>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
>>> for more information about IESG DISCUSS and COMMENT positions.
>>>
>>>
>>> The document, along with other ballot positions, can be found here:
>>> https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-sorting-and-paging/
>>>
>>>
>>>
>>> ----------------------------------------------------------------------
>>> DISCUSS:
>>> ----------------------------------------------------------------------
>>>
>>> Should we say something about which order the sorting criteria are
>>> applied (first to last vs last to first) when multiple sortItems are
>>> specified in a query?
>> [ML] The common interpretation is from left to right so I don't think we
>> need to clarify this concept.
> I think I can accept not saying more on this subject, but I am curious:
> when you say left to right, that means that the leftmost parameter is
> higher priority?  So that, to give a totally contrived example, if I had
> pairs of (name, id), a query with &sort=name;&sort=id; would give:
>
> ("alpha", 10)
> ("alpha", 20)
> ("beta", 10)
> ("beta", 20)

[ML]  Exactly.

One minor comments: the right notation would be &sort=name,id

>
>>> I recognize that in the HATEOS model, the actual JSONPaths reported by
>>> the server should be used by the client to determine what a given sort
>>> property does, but it also seems like it would be confusing for this
>>> document to specify (e.g.) an "email" property with specific JSONPath,
>>> and then have a server go off and use "email" to mean something else,
>>> even if that is just the addition of "pref" as discussed at the end of
>>> Section 2.3.1.  Do we want to try to have the properties defined by this
>>> document be universally defined and encourage the use of new/different
>>> property names for variations on them?  (The answer may well be "no",
>>> but the answer is not intuitively clear to me.)  To put it another way,
>>> is the list in Section 2.3.1 normative, or just an example?
>> [ML] I would say "normative" just to facilitate interoperability and
>> avoid ambiguities. Maybe it could be enough to say that the sorting
>> properties deifined in the document are considered reserved so an RDAP
>> server MUST not map them onto other RDAP response values.
>>
>> Does it work for you?
> Yes, that would work for me.  Thanks!
[ML] Perfect.
>
>>> ----------------------------------------------------------------------
>>> COMMENT:
>>> ----------------------------------------------------------------------
>>>
>>> Section 1
>>>
>>>      However, there are some drawbacks associated with the use of the HTTP
>>>      header.  First, the header properties cannot be set directly from a
>>>      web browser.  Moreover, in an HTTP session, the information on the
>>>      status (i.e. the session identifier) is usually inserted in the
>>>      header or a cookie, while the information on the resource
>>>      identification or the search type is included in the query string.
>>>      The second approach is therefore not compliant with the HTTP standard
>>>      [RFC7230].  As a result, this document describes a specification
>>>      based on the use of query parameters.
>>>
>>> A few more words (section number from 7230?) on why the second approach
>>> is not compliant with HTTP might help the reader, though it isn't
>>> stricly necessary (we're not using it, after all).
>> [ML] Could it be better to replace RFC7230 with RFC7231 and put a
>> refernce to Section 8.3.1
>> (https://tools.ietf.org/html/rfc7231#section-8.3.1) ?
> It might be, but I think I still don't understand why using the HTTP header
> field for sorting and paging information is not compliant with HTTP -- the
> linked section says that header fields can be used to communicate
> information about the target resource, which IIUC includes the resource as
> qualified by the query string.  (But note that I am not an HTTP expert...)

[ML]  Would it be more appropriate to update the sentence as in the 
following?

OLD

The second approach is therefore not compliant with the HTTP standard 
[RFC7230]

NEW

The second approach is therefore not compliant with the most common 
practices about the usage of the HTTP headers [RFC7231]
>>> Section 2.1
>>>
>>>         *  "jsonPath": "String" (OPTIONAL) the JSONPath of the RDAP field
>>>            corresponding to the property;
>>>
>>> What is this path relative to?  (Does the client have to know from the
>>> other context what type of object it refers to?)
>> [ML]  All the JSONPath expressions defined in the document are relative
>> to the root of an RDAP response.  The sorting_metadata object is
>> included in the same response so I think that the context is clear and
>> no further clarification is needed.
> Now that you mention it, I do recall other discussion of paths being
> relative to the root; my apologies for the noise.
[ML] You are welcome.
>
>>>         *  "links": "Link[]" (OPTIONAL) an array of links as described in
>>>            [RFC8288] containing the query string that applies the sort
>>>            criterion.
>>>
>>> Just to check: this is going to have the same structure for a Link
>>> object that draft-ietf-regext-rdap-partial-response does?  (I am not
>>> coming up with a great way to deduplicate the definitions, off the top
>>> of my head.)
>> [ML] Yes. The sorting links have the same structure as the subsetting
>> links (see Section 2.3.2.).
>>>      o  "pageSize": "Numeric" (OPTIONAL) a numeric value representing the
>>>         number of objects returned in the current page.  It MUST be
>>>         provided if and only if the total number of objects exceeds the
>>>         page size.  This property is redundant for RDAP clients because
>>>         the page size can be derived from the length of the search results
>>>         array but, it can be helpful if the end user interacts with the
>>>         server through a web browser;
>>>
>>> If it's redundant, we should probably say something about error handling
>>> for when the things that are supposed to be identical have different
>>> values.
>> [ML]  I think this situation is very unlinkely.  Anyway, in this case,
>> the length of the results array really counts. Obviously, it is a bit
>> more likely that the totalCount value might be different from the sum
>> of  the number of results in each page. In fact, even if the
>> registration data can't be considered real-time data, it might happen
>> that the count parameter is present in the initial query and it might
>> take time to scroll the result set, so there could be a small likelihood
>> that the initial totalCount value is obsolete because the result set is
>> changed in the meantime. Also in this case, the sum of each result array
>> length really counts.
>>
>> Should I write something about?
> Thanks for the additional explanation.  I think that if we were to write
> anything more, it would just be a few words in the description here to
> indicate that it's just for convenience, e.g., "representing the number of
> objects that should have been returned in the current page".  But it is
> probably okay to leave it unchanged, too.
[ML] Changed.
>>> Section 2.3
>>>
>>>      Except for sorting IP addresses, servers MUST implement sorting
>>>      according to the JSON value type of the RDAP field the sorting
>>>      property refers to.  That is, JSON strings MUST be sorted
>>>      lexicographically and JSON numbers MUST be sorted numerically.  If IP
>>>      addresses are represented as JSON strings, they MUST be sorted based
>>>      on their numeric conversion.
>>>
>>> There are more JSON types than string and number; are those other types
>>> garanteed to not appear in sortable RDAP fields?  (I can't see how such
>>> a guarantee could be made, given that servers can define their own
>>> sorting properties.)
>> [ML] The other primitive JSON type remaining is boolean but I don't
>> think it makes sense to sort by a boolean property. Instead, I missed
> I think I also had JSON maps in mind, but I guess it is not exactly defined
> to sort by a map itself, only a primitive type, so my question was a bit
> silly.
[ML] Yes. Only values with primitive types.
>> that those values denoting dates and times MUST be sorted in
>> chronological order even if they are strings. I'll update the sentence
>> as in the following:
>>
>> Except for sorting IP addresses and values denoting dates and times, servers MUST implement sorting
>>      according to the JSON value type of the RDAP field the sorting
>>      property refers to.  That is, JSON strings MUST be sorted
>>      lexicographically and JSON numbers MUST be sorted numerically.
>>      Values denoting dates and times MUST be sorted in chronological order.  If IP
>>      addresses are represented as JSON strings, they MUST be sorted based
>>      on their numeric conversion.
>>
>> Does it work for you?
> I think so; thanks.
[ML] Good.
>
>>>      If the "sort" parameter reports an allowed sorting property, it MUST
>>>      be provided in the "currentSort" field of the "sorting_metadata"
>>>      element.
>>>
>>> nit: is "reports" the best word to describe this behavior (which, IIUC,
>>> is "present in the query component of the request URL"?
>> [ML] Sounds better.
>>> Section 2.3.1
>>>
>>>      In the "sort" parameter ABNF syntax, property-ref represents a
>>>      reference to a property of an RDAP object.  Such a reference could be
>>>      expressed by using a JSONPath.  The JSONPath in a JSON document
>>>
>>> nit: is there a missing word here ("a JSONPath expression")?
>> [ML] Just for coinciseness, may I use "jsonpath" to mean "JSONPath
>> expression" and keep "JSONPath" to refer to the specification?
>>
>> I could write something like: "JSONPath expression (named "jsonpath" in
>> the following)"
> That's fine from my perspective, sure.
[ML] Perfect.
>
>>>      o  Note that some of the object specific properties are also defined
>>>         as query paths.  The object specific properties include:
>>>
>>> nit: the list structure in this item does not seem parallel to the
>>> structure of the first item.
>> [ML] OK. I'll change the sentence as in the following:
>>
>> Object specific properties.  Note that some of these properties
>>         are also defined as query paths.  These properties include:
>>
>>>         as two representations of the same value.  By default, the
>>>         unicodeName value MUST be used while sorting.  When the
>>>         unicodeName is unavailable, the value of the ldhName MUST be used
>>>         instead;
>>>
>>> I'm not entirely sure how much value "by default" adds here.  Would the
>>> meaning be different if we said "The unicodeName value MUST be used
>>> while sorting if it is present; when the unicodeName is unavailable, the
>>> value of the ldhName is used instead"?
>> [ML] No, it wouldn't. I'll change the sentence as you suggest.
>>>      o  The jCard "sort-as" parameter MUST be ignored for the sorting
>>>         capability described in this document;
>>>
>>> It's a little bit of a juxtaposition to refer to jCard here in the prose
>>> but vcard in the table.
>> [ML] I would keep it as is. Instead, I would replace all the "vcard"
>> occurrences with "jCard". Being jCard a transliteration of vCard in
>> JSON, it seems appropriate to me to keep the references to RFC6350
>> sections and  to use the corresponding jCard elements for the mapping
>> between the sorting properties and the RDAP response elements. Besides,
>> I would write a sentence about the fact that jCard is the JSON format of
>> vCard, add a link to RFC7095 and insert RFC7095 among the Normative
>> References.
>>
>> Do you agree?
> Yes, thanks.
[ML] OK.
>
>>>      o  Even if a nameserver can have multiple IPv4 and IPv6 addresses,
>>>         the most common configuration includes one address for each IP
>>>         version.  Therefore, the assumption of having a single IPv4 and/or
>>>         IPv6 value for a nameserver cannot be considered too stringent.
>>>
>>> I disagree with the flat assertion that it "cannot be considered too
>>> stringent".  It can be so considered, as a matter of difference of
>>> opinion; what is appropriate to do here is to say that this
>>> document/protocol makes the assumption (especially since we go on to
>>> describe the exception-handling procedure when the assumption is
>>> violated).
>> [ML] May I udpate that sentence as in the following?
>>
>> OLD
>>
>> Therefore, the assumption of having a single IPv4 and/or
>>         IPv6 value for a nameserver cannot be considered too stringent.
>>
>> NEW
>>
>> Therefore, this specification makes the assumption that nameservers have a single IPv4 and/or
>>         IPv6 value.
> Yes, please!
[ML] Done.
>
>>>      o  Multiple events with a given action on an object might be
>>>         returned.  If this occurs, sorting MUST be applied to the most
>>>         recent event;
>>>
>>> This makes a lot of sense as the default and I don't propose changing it
>>> now, but I do wonder how hard it would be to add support later for
>>> sorting on (say) the oldest event instead.
>> [ML] Well, I wrote that sentence because some RDAP events can appear
>> multiple times. For example, a domain might be locked-unlocked
>> repeatedly. The purpose of that sentence is just to avoid ambiguities
>> and implicitly suggest RDAP providers to arrange events with the same
>> type in descending chronological order.
>>>      The "jsonPath" field in the "sorting_metadata" element is used to
>>>      clarify the RDAP field the sorting property refers to.  The mapping
>>>      between the sorting properties and the JSONPaths of the RDAP fields
>>>      is shown below:
>>>      [...]
>>>         name
>>>
>>>            $.domainSearchResults[*].unicodeName
>>>
>>> This seems to ignore the subtlety regarding unicodeName vs ldhName.  Is
>>> there a way it could be expressed in JSONPath?
>> [ML] If unicodeName and ldhName were alternative, the JSONPath union
>> operator would fit (i.e.
>> $.domainSearchResults[*].[unicodeName,ldhName]). Currently, RFC7483
>> contains no assumption about when they should/must be present but
>> examples seem to recommend to present unicodeName only for IDNs. When
>> both the properties are present, the union operator doesn't fit exactly
>> and I haven't still found the right JSONPath expression based only on
>> the basic operators. However, since the "jsonPath" member is only for
>> documentation, the aforesaid JSONPath expression could be the most
>> suitable for conveying that sorting is applied on a kind of
>> <unicodeName, ldhName> combination.
> I have to defer to your expertise here; thank you for thinking about it.
[ML] Thanks. Maybe this is the only case where the JSONPath WG outcomes 
might be helpful :-)
>
>>>      o  Nameserver
>>>
>>>         name
>>>
>>>            $.domainSearchResults[*].unicodeName
>>>
>>> Presumably this is supposed to be nameserverSearchResults?
>> [ML] Absolutely. It's a cut-and-paste typo :-)
>>> Section 2.4
>>>
>>> I think we want another introductory paragraph like:
>>>
>>> % The cursor parameter is used by the server to preserve information
>>> % about the pagination state of a given query's results across calls to
>>> % the search API, so that successive requests by the client can return
>>> % page N, N+1, N+2, etc.  Its value is only required to be interpretable
>>> % by the server and could be implemented, for example, as an opaque
>>> % database lookup key.  If a server does use a method for generating
>>> % cursor values that involves internal structure, such as the one
>>> % described below, the server needs to recognize that the value supplied
>>> % by a client could have been modified (maliciously), and implement
>>> % appropriate bounds-checking and similar measures when parsing received
>>> % values.
>>>
>>> The current wording strongly suggests that base64-encoding a meaningful
>>> value that the client could inspect or even construct is required, and I
>>> do not think that is very maintainable or what was intended, given the
>>> current second paragraph ("servers can change the method over time
>>> without announcing anything to clients").
>>>
>>> (side note) I'm also pretty partial to the way JMAP discusses returning
>>> (paginated, but non-uniformly) changes to a given data stream, e.g., at
>>> https://www.rfc-editor.org/rfc/rfc8620.html#section-5.2 -- any given
>>> state is named, and you can get "stuff starting at <named state>" and
>>> the name to use for the state as of the current reply.
>> [ML] Maybe I didn't make myself clear.
>>
>> The Base64 encoding is a simple (unrecommended) trasformation to make
>> the cursor value opaque to the client. It just seemed suitable to me for
>> being used in some examples.But if you take a loook at the example of
>> Figure 6, you may note that you can't obtain a meaningful result by
>> simply Base64-decoding the cursor value. Definitively, the method to
>> encrypt the cursor value must be more complex than a mere Base64 encoding.
>>
>> Regarding the sentence between brackets, it means that servers can
>> change the underlying pagination strategy without having an impact on
>> clients. A server can initially implement the offset pagination and then
>> turns to the keyset pagination but this has no effect on clients' features.
>>
>> The same concepts about the checks that servers should make in order to
>> check the cursor value are reported both in the "Negative Answer"
>> section and in Appedix C.3. "Paging"
>>
>> Anyway, I'll try to integrate your text in the current document and add
>> a sentence with the purpose of discouraging the use of the
>> Base64-encoding in the cursor implementations.
> Thank you; I did not know that you wanted to discourage the use of plain
> base64 encoding, but that is reassuring to know.
> (I did notice that the example in Figure 6 did not decode to a meaningful
> result, but did not make much of a conclusion from that.)
[ML] OK. I will add some text to clarify that a mere Base64 encoding is 
not recommended to encrypt the cursor value.
>
>>> Section 4
>>>
>>> If the server doesn't have access to an efficient (e.g.) counting
>>> operation on the backend, would we recommend that the server not support
>>> sorting/pagination, since there's not much benefit from having the
>>> server pull up all the results and count them just to be able to return
>>> the total count value back to the client, and then go do the same work again
>>> when the client asks for the next page of results?
>> [ML] In my implementation the RDAP server doesn't present the count
>> operator in the sorting and paging links. The number of results doesn't
>> change at all if the result set is sorted by a property rather than
>> another. The same generally occurs (as I wrote above) if the client is
>> scrolling the result set pages. So why to repeat the count parameter in
>> the links? The totalCount value is returned in the response to the
>> initial query and, as It is no more repeated in the links,  the counting
>> operation is not executed. Therefore, we don't need to make particular
>> assumptions about the performance of counting operation.
>>> Section 7
>>>
>>> I suggest noting that (encoded) structured "cursor" values present a new
>>> attack surface on the server that needs to be protected.
>> [ML] Sorry, could you futherly explain this concept? AFAIK, it is
>> possible to protect REST API endpoints but not query parameters.
> I think this was assuming that the server was going to just base64-encode
> something like "offset=100,limit=50" -- in that case a client could pass in
> the base64'd version of "offset=1000000000,limit=50".  The server would
> need to sanity-check the results of base64 decoding and reject the
> too-large offset.  If the server is expecting to do a fancier
> self-encrypted-token scheme for the cursor, the integrity check associated
> with the encryption takes care of this protection inherently, and we may
> not need to mention anything about sanitizing these valuess..
[ML] OK.
>
>>>      results in a response.  However, this last security policy can result
>>>      in a higher inefficiency if the RDAP server does not provide any
>>>      functionality to return the truncated results.
>>>
>>> I'm not sure I understand (or agree with) this last sentence -- it seems
>>> that unlateral silent truncation of results by the server leads to not
>>> just inefficiency but also potential security considerations in its own
>>> right, with the client not knowing that it has incomplete results.
>>> Also, if the server is truncating the results, by definition it "has
>>> functionality to return the truncated results" -- that's what it's
>>> doing!  So I assume the intent was to say something about negotiating or
>>> indicating that the results are truncated, not actually doing the
>>> truncation.
>> [ML] I think that servers legitimately truncate the result sets to
>> mitigate the risk of resource exhaustion and consequent denial of
>> service. The implementation of the capablities described in this
>> document makes servers to keep on managing sustainable result sets and,
>> at the same time, increases clients'possibility to avoid truncation and
>> find relevant results.
> I agree with the paragraph you just wrote.  However, I think that the state
> of affairs prior to this document, with unilateral truncation by the
> server, can lead not just to "inefficiency" but also to security risks.  So
> I was hoping to see something like "can result in higher inefficiency or
> risk due to acting on incomplete information".
[ML] OK. I will change the sentence as you suggest.
> My second point ("Also, if the server [...]") was intending to suggest that
> the last sentence say something like "if the RDAP Server does not provide
> any functionality to return sorted results or iterate through the full
> result set".

[ML] You are right. I mispelled the sentence. Is it fine for you if I 
change the sentence as in the follwing?

OLD

if the RDAP server does not provide any
     functionality to return the truncated results

NEW

if the RDAP server does not provide any
     functionality to return results removed by truncation

>
>>>      The new parameters presented in this document provide RDAP operators
>>>      with a way to implement a server that reduces inefficiency risks.
>>>
>>> [same question about "inefficiency" being the right word]
>> [ML] Maybe I can replace the phrase "that reduces inefficiency risks."
>> with the phrase "that reduces the risk of resource exhaustion and
>> consequent denial of service".
>>
>> Are you ok with it?
> Denial of service is only one of the risks I have in mind; another is that
> if a server silently truncates, a client will have incomplete data and
> might derive a conclusion ("domain X does not satisfy property Y") that is
> fal
[ML] OK. As I wrote above, I'll change the sentence to outline the risk 
due to acting on incomplete information.
> se.
>
>>> Appendix B
>>>
>>>      o  It does not allow direct navigation to arbitrary pages because the
>>>         result set must be scrolled in sequential order starting from the
>>>         initial page;
>>>
>>> (side note) I didn't follow the references, so maybe this was covered
>>> there, but I don't quite follow why direct navigation is impossible.  If
>>> you use a key field for seeking, can't you just start in the middle from
>>> some known value for that key field?
>> [ML]  Especially when you know the total counf of a result set,  you can
>> directly jump to a specific point in the result set through offset
>> pagination but you can't do the same through keyset pagination because
>> you don't know the key value at that point in advance. One can wonder:
>> what jumping in the result set is use for? Well, for example, if you are
>> looking for a specific item in a ordered collection of items, you could
>> find it through the quicksort algorithm.
> I don't want to press this topic very much, so let me just try a brief
> example.  Suppose I have a sorted set of ASCII strings, and I want to see if the
> string "koala" is present.  I could start at the beginning and look at each
> one in turn until I get to something that sorts after "koala", or I could
> ask the database "give me the first thing you have that is after "k", which
> gets me some of the say there.  Is the problem that you need an actual
> value in the dataset to start from, and since "k" isn't guaranteed to be in
> the set you are forced to start from the beginning?

[ML] Offset pagination is based on the positions of the results within 
the result set while keyset pagination is based on a unique combination 
of values of the results. You can always skip the first K results of a 
result set by specifying offset=K but you can't do the same through 
keyset pagination because you don't know what is the combination of 
values placed at position K.

Let me give you an example that can clarify.

Let's suppose that the query "k*" returns N=1000 results and the length 
of result page is 100. The fastest method to find if "koala" is present 
is to jump to the middle (i.e. offset=500) of the result set and look 
the first result. If it is lexicographically lower than "koala" then 
"koala" might be in second half, on the contrary, "koala" might be in 
the first half. Let's suppose that the first item is greater than 
"koala", then you can jump to the middle of the first half (offset=250) 
and apply the above dichotomy in turn until you can find if "koala" is 
present or not. This process takes log2(N)  steps maximum.

You can't do the same through keyset pagination because you don't know 
how the values are distributed in the result set. You can only scroll 
the result set from the first page to the last and this process takes N 
steps maximum.

However, in general, one is interested in all the results (or in a 
subset) returned by a query rather than a single result. In this case, 
provided that the results can always be sorted according to a unique 
index, keyset pagination is more efficient than offset pagination. Let's 
take the afore example. By offset pagination, the underlying DBMS 
selects always 1000 results and then returns the current page so this 
means that, in the worst case, the DBMS will select 1000 results for 10 
times (from offset=0 to offset=900). By keyset pagination the number of 
results the underlying DBMS selects decreases by 100 results each step. 
At the beginning, it selects 1000 results and returns the first page, 
then it selects 900 results and returns the second page.

Note also two facts:

- the time needed to scroll the result set could be significant when the 
result set is huge

- for all the RDAP searchable objects, it is always possible to build 
more or less easily a combination of properties acting as unique index 
and this is true regardless of whether the search includes the sort 
parameter or not. For example, for the entity object class "handle" is a 
unique property but if the search includes "sort=registrationDate", the 
combination <registrationDate, handle> is unique.

>>> Appendix C.2
>>>
>>>      total count.  Therefore, as "totalCount" is an optional response
>>>      information, fetching always the total number of rows has been
>>>
>>> I'm not entirely sure in what sense "optional response information" is
>>> intended -- my reading of Section 2.1 is that it's mandatory to return
>>> totalCount if the client included the 'count' query parameter.
>> [ML] Exactly but it isn't returned always. For this reason, it is an
>> optional member of the paging_metadata object.
> Okay.  (I think your reply to another ballot comment also helped clarify
> this for me.)
[ML] Good.
>> Looking forward for your reply to my questions/comments.
> Thanks a lot for the explanations and updates; hopefully I have clarified
> anything that was unclear.
>
> -Ben
>
> _______________________________________________
> regext mailing list
> regext@ietf.org
> https://www.ietf.org/mailman/listinfo/regext

-- 
Dr. Mario Loffredo
Systems and Technological Development Unit
Institute of Informatics and Telematics (IIT)
National Research Council (CNR)
via G. Moruzzi 1, I-56124 PISA, Italy
Phone: +39.0503153497
Mobile: +39.3462122240
Web: http://www.iit.cnr.it/mario.loffredo