Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)
Mario Loffredo <mario.loffredo@iit.cnr.it> Fri, 02 October 2020 06:41 UTC
Return-Path: <mario.loffredo@iit.cnr.it>
X-Original-To: regext@ietfa.amsl.com
Delivered-To: regext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E39CF3A0E7F; Thu, 1 Oct 2020 23:41:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.889
X-Spam-Level: **
X-Spam-Status: No, score=2.889 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, NICE_REPLY_A=-0.213, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bU5nTT1o63-j; Thu, 1 Oct 2020 23:41:14 -0700 (PDT)
Received: from smtp.iit.cnr.it (mx3.iit.cnr.it [146.48.98.150]) by ietfa.amsl.com (Postfix) with ESMTP id 7FBCB3A00C4; Thu, 1 Oct 2020 23:41:13 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.iit.cnr.it (Postfix) with ESMTP id B45686007B1; Fri, 2 Oct 2020 08:41:11 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mx3.iit.cnr.it
Received: from smtp.iit.cnr.it ([127.0.0.1]) by localhost (mx3.iit.cnr.it [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KQPHFq6UZdX9; Fri, 2 Oct 2020 08:41:05 +0200 (CEST)
Received: from [192.12.193.108] (pc-loffredo.nic.it [192.12.193.108]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by smtp.iit.cnr.it (Postfix) with ESMTPSA id 460AA60012A; Fri, 2 Oct 2020 08:41:05 +0200 (CEST)
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: regext-chairs@ietf.org, draft-ietf-regext-rdap-sorting-and-paging@ietf.org, The IESG <iesg@ietf.org>, Tom Harrison <tomh@apnic.net>, regext@ietf.org
References: <160089722480.18312.1611285341459635513@ietfa.amsl.com> <78b84143-eecd-ea03-a4db-077dd9920dc4@iit.cnr.it> <20200929001425.GN89563@kduck.mit.edu> <c76db7f1-8c88-142b-5242-49e0a62e9b38@iit.cnr.it> <20201002022020.GR89563@kduck.mit.edu>
From: Mario Loffredo <mario.loffredo@iit.cnr.it>
Message-ID: <69b025f9-89c7-932f-5707-b6b31e492565@iit.cnr.it>
Date: Fri, 02 Oct 2020 08:37:42 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <20201002022020.GR89563@kduck.mit.edu>
Content-Type: text/plain; charset="iso-8859-15"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: it
Archived-At: <https://mailarchive.ietf.org/arch/msg/regext/nuRZMusTGON1CDYUFMX-ekQxZ9A>
Subject: Re: [regext] Benjamin Kaduk's Discuss on draft-ietf-regext-rdap-sorting-and-paging-17: (with DISCUSS and COMMENT)
X-BeenThere: regext@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Registration Protocols Extensions <regext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/regext>, <mailto:regext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/regext/>
List-Post: <mailto:regext@ietf.org>
List-Help: <mailto:regext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/regext>, <mailto:regext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Oct 2020 06:41:18 -0000
Hi Ben, thnks again for your careful feedback. I'll publish -18 as soon as possible. Best, Mario Il 02/10/2020 04:20, Benjamin Kaduk ha scritto: > Hi Mario, > > Not a whole lot left to say, but what there is is inline. > > On Wed, Sep 30, 2020 at 01:28:53PM +0200, Mario Loffredo wrote: >> Hi Ben, >> >> thanks a loto for uor quick reply to my responses. My comments are inline. >> >> Il 29/09/2020 02:14, Benjamin Kaduk ha scritto: >>> Hi Mario, >>> >>> Also inline. >>> >>> On Sat, Sep 26, 2020 at 03:56:03PM +0200, Mario Loffredo wrote: >>>> Hi Benjamin, >>>> >>>> thanks a lot for your extensive review. I apologize for the delay in >>>> replying but I have been very busy the last two days and your feedback >>>> is very detailed. >>>> >>>> Please find my coments inline. >>>> >>>> Il 23/09/2020 23:40, Benjamin Kaduk via Datatracker ha scritto: >>>>> Benjamin Kaduk has entered the following ballot position for >>>>> draft-ietf-regext-rdap-sorting-and-paging-17: Discuss >>>>> >>>>> When responding, please keep the subject line intact and reply to all >>>>> email addresses included in the To and CC lines. (Feel free to cut this >>>>> introductory paragraph, however.) >>>>> >>>>> >>>>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html >>>>> for more information about IESG DISCUSS and COMMENT positions. >>>>> >>>>> >>>>> The document, along with other ballot positions, can be found here: >>>>> https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-sorting-and-paging/ >>>>> >>>>> >>>>> >>>>> ---------------------------------------------------------------------- >>>>> DISCUSS: >>>>> ---------------------------------------------------------------------- >>>>> >>>>> Should we say something about which order the sorting criteria are >>>>> applied (first to last vs last to first) when multiple sortItems are >>>>> specified in a query? >>>> [ML] The common interpretation is from left to right so I don't think we >>>> need to clarify this concept. >>> I think I can accept not saying more on this subject, but I am curious: >>> when you say left to right, that means that the leftmost parameter is >>> higher priority? So that, to give a totally contrived example, if I had >>> pairs of (name, id), a query with &sort=name;&sort=id; would give: >>> >>> ("alpha", 10) >>> ("alpha", 20) >>> ("beta", 10) >>> ("beta", 20) >> [ML] Exactly. >> >> One minor comments: the right notation would be &sort=name,id > This is what I get for writing email without consulting the document; > thanks for spotting it. > >>>>> I recognize that in the HATEOS model, the actual JSONPaths reported by >>>>> the server should be used by the client to determine what a given sort >>>>> property does, but it also seems like it would be confusing for this >>>>> document to specify (e.g.) an "email" property with specific JSONPath, >>>>> and then have a server go off and use "email" to mean something else, >>>>> even if that is just the addition of "pref" as discussed at the end of >>>>> Section 2.3.1. Do we want to try to have the properties defined by this >>>>> document be universally defined and encourage the use of new/different >>>>> property names for variations on them? (The answer may well be "no", >>>>> but the answer is not intuitively clear to me.) To put it another way, >>>>> is the list in Section 2.3.1 normative, or just an example? >>>> [ML] I would say "normative" just to facilitate interoperability and >>>> avoid ambiguities. Maybe it could be enough to say that the sorting >>>> properties deifined in the document are considered reserved so an RDAP >>>> server MUST not map them onto other RDAP response values. >>>> >>>> Does it work for you? >>> Yes, that would work for me. Thanks! >> [ML] Perfect. >>>>> ---------------------------------------------------------------------- >>>>> COMMENT: >>>>> ---------------------------------------------------------------------- >>>>> >>>>> Section 1 >>>>> >>>>> However, there are some drawbacks associated with the use of the HTTP >>>>> header. First, the header properties cannot be set directly from a >>>>> web browser. Moreover, in an HTTP session, the information on the >>>>> status (i.e. the session identifier) is usually inserted in the >>>>> header or a cookie, while the information on the resource >>>>> identification or the search type is included in the query string. >>>>> The second approach is therefore not compliant with the HTTP standard >>>>> [RFC7230]. As a result, this document describes a specification >>>>> based on the use of query parameters. >>>>> >>>>> A few more words (section number from 7230?) on why the second approach >>>>> is not compliant with HTTP might help the reader, though it isn't >>>>> stricly necessary (we're not using it, after all). >>>> [ML] Could it be better to replace RFC7230 with RFC7231 and put a >>>> refernce to Section 8.3.1 >>>> (https://tools.ietf.org/html/rfc7231#section-8.3.1) ? >>> It might be, but I think I still don't understand why using the HTTP header >>> field for sorting and paging information is not compliant with HTTP -- the >>> linked section says that header fields can be used to communicate >>> information about the target resource, which IIUC includes the resource as >>> qualified by the query string. (But note that I am not an HTTP expert...) >> [ML] Would it be more appropriate to update the sentence as in the >> following? >> >> OLD >> >> The second approach is therefore not compliant with the HTTP standard >> [RFC7230] >> >> NEW >> >> The second approach is therefore not compliant with the most common >> practices about the usage of the HTTP headers [RFC7231] > That's better, though it does not help the reader find out what those > "common practices" are. (It suffices to justify the choice, though, so I > won't press the point further.) > >>>>> Section 2.1 >>>>> >>>>> * "jsonPath": "String" (OPTIONAL) the JSONPath of the RDAP field >>>>> corresponding to the property; >>>>> >>>>> What is this path relative to? (Does the client have to know from the >>>>> other context what type of object it refers to?) >>>> [ML] All the JSONPath expressions defined in the document are relative >>>> to the root of an RDAP response. The sorting_metadata object is >>>> included in the same response so I think that the context is clear and >>>> no further clarification is needed. >>> Now that you mention it, I do recall other discussion of paths being >>> relative to the root; my apologies for the noise. >> [ML] You are welcome. >>>>> * "links": "Link[]" (OPTIONAL) an array of links as described in >>>>> [RFC8288] containing the query string that applies the sort >>>>> criterion. >>>>> >>>>> Just to check: this is going to have the same structure for a Link >>>>> object that draft-ietf-regext-rdap-partial-response does? (I am not >>>>> coming up with a great way to deduplicate the definitions, off the top >>>>> of my head.) >>>> [ML] Yes. The sorting links have the same structure as the subsetting >>>> links (see Section 2.3.2.). >>>>> o "pageSize": "Numeric" (OPTIONAL) a numeric value representing the >>>>> number of objects returned in the current page. It MUST be >>>>> provided if and only if the total number of objects exceeds the >>>>> page size. This property is redundant for RDAP clients because >>>>> the page size can be derived from the length of the search results >>>>> array but, it can be helpful if the end user interacts with the >>>>> server through a web browser; >>>>> >>>>> If it's redundant, we should probably say something about error handling >>>>> for when the things that are supposed to be identical have different >>>>> values. >>>> [ML] I think this situation is very unlinkely. Anyway, in this case, >>>> the length of the results array really counts. Obviously, it is a bit >>>> more likely that the totalCount value might be different from the sum >>>> of the number of results in each page. In fact, even if the >>>> registration data can't be considered real-time data, it might happen >>>> that the count parameter is present in the initial query and it might >>>> take time to scroll the result set, so there could be a small likelihood >>>> that the initial totalCount value is obsolete because the result set is >>>> changed in the meantime. Also in this case, the sum of each result array >>>> length really counts. >>>> >>>> Should I write something about? >>> Thanks for the additional explanation. I think that if we were to write >>> anything more, it would just be a few words in the description here to >>> indicate that it's just for convenience, e.g., "representing the number of >>> objects that should have been returned in the current page". But it is >>> probably okay to leave it unchanged, too. >> [ML] Changed. >>>>> Section 2.3 >>>>> >>>>> Except for sorting IP addresses, servers MUST implement sorting >>>>> according to the JSON value type of the RDAP field the sorting >>>>> property refers to. That is, JSON strings MUST be sorted >>>>> lexicographically and JSON numbers MUST be sorted numerically. If IP >>>>> addresses are represented as JSON strings, they MUST be sorted based >>>>> on their numeric conversion. >>>>> >>>>> There are more JSON types than string and number; are those other types >>>>> garanteed to not appear in sortable RDAP fields? (I can't see how such >>>>> a guarantee could be made, given that servers can define their own >>>>> sorting properties.) >>>> [ML] The other primitive JSON type remaining is boolean but I don't >>>> think it makes sense to sort by a boolean property. Instead, I missed >>> I think I also had JSON maps in mind, but I guess it is not exactly defined >>> to sort by a map itself, only a primitive type, so my question was a bit >>> silly. >> [ML] Yes. Only values with primitive types. >>>> that those values denoting dates and times MUST be sorted in >>>> chronological order even if they are strings. I'll update the sentence >>>> as in the following: >>>> >>>> Except for sorting IP addresses and values denoting dates and times, servers MUST implement sorting >>>> according to the JSON value type of the RDAP field the sorting >>>> property refers to. That is, JSON strings MUST be sorted >>>> lexicographically and JSON numbers MUST be sorted numerically. >>>> Values denoting dates and times MUST be sorted in chronological order. If IP >>>> addresses are represented as JSON strings, they MUST be sorted based >>>> on their numeric conversion. >>>> >>>> Does it work for you? >>> I think so; thanks. >> [ML] Good. >>>>> If the "sort" parameter reports an allowed sorting property, it MUST >>>>> be provided in the "currentSort" field of the "sorting_metadata" >>>>> element. >>>>> >>>>> nit: is "reports" the best word to describe this behavior (which, IIUC, >>>>> is "present in the query component of the request URL"? >>>> [ML] Sounds better. >>>>> Section 2.3.1 >>>>> >>>>> In the "sort" parameter ABNF syntax, property-ref represents a >>>>> reference to a property of an RDAP object. Such a reference could be >>>>> expressed by using a JSONPath. The JSONPath in a JSON document >>>>> >>>>> nit: is there a missing word here ("a JSONPath expression")? >>>> [ML] Just for coinciseness, may I use "jsonpath" to mean "JSONPath >>>> expression" and keep "JSONPath" to refer to the specification? >>>> >>>> I could write something like: "JSONPath expression (named "jsonpath" in >>>> the following)" >>> That's fine from my perspective, sure. >> [ML] Perfect. >>>>> o Note that some of the object specific properties are also defined >>>>> as query paths. The object specific properties include: >>>>> >>>>> nit: the list structure in this item does not seem parallel to the >>>>> structure of the first item. >>>> [ML] OK. I'll change the sentence as in the following: >>>> >>>> Object specific properties. Note that some of these properties >>>> are also defined as query paths. These properties include: >>>> >>>>> as two representations of the same value. By default, the >>>>> unicodeName value MUST be used while sorting. When the >>>>> unicodeName is unavailable, the value of the ldhName MUST be used >>>>> instead; >>>>> >>>>> I'm not entirely sure how much value "by default" adds here. Would the >>>>> meaning be different if we said "The unicodeName value MUST be used >>>>> while sorting if it is present; when the unicodeName is unavailable, the >>>>> value of the ldhName is used instead"? >>>> [ML] No, it wouldn't. I'll change the sentence as you suggest. >>>>> o The jCard "sort-as" parameter MUST be ignored for the sorting >>>>> capability described in this document; >>>>> >>>>> It's a little bit of a juxtaposition to refer to jCard here in the prose >>>>> but vcard in the table. >>>> [ML] I would keep it as is. Instead, I would replace all the "vcard" >>>> occurrences with "jCard". Being jCard a transliteration of vCard in >>>> JSON, it seems appropriate to me to keep the references to RFC6350 >>>> sections and to use the corresponding jCard elements for the mapping >>>> between the sorting properties and the RDAP response elements. Besides, >>>> I would write a sentence about the fact that jCard is the JSON format of >>>> vCard, add a link to RFC7095 and insert RFC7095 among the Normative >>>> References. >>>> >>>> Do you agree? >>> Yes, thanks. >> [ML] OK. >>>>> o Even if a nameserver can have multiple IPv4 and IPv6 addresses, >>>>> the most common configuration includes one address for each IP >>>>> version. Therefore, the assumption of having a single IPv4 and/or >>>>> IPv6 value for a nameserver cannot be considered too stringent. >>>>> >>>>> I disagree with the flat assertion that it "cannot be considered too >>>>> stringent". It can be so considered, as a matter of difference of >>>>> opinion; what is appropriate to do here is to say that this >>>>> document/protocol makes the assumption (especially since we go on to >>>>> describe the exception-handling procedure when the assumption is >>>>> violated). >>>> [ML] May I udpate that sentence as in the following? >>>> >>>> OLD >>>> >>>> Therefore, the assumption of having a single IPv4 and/or >>>> IPv6 value for a nameserver cannot be considered too stringent. >>>> >>>> NEW >>>> >>>> Therefore, this specification makes the assumption that nameservers have a single IPv4 and/or >>>> IPv6 value. >>> Yes, please! >> [ML] Done. >>>>> o Multiple events with a given action on an object might be >>>>> returned. If this occurs, sorting MUST be applied to the most >>>>> recent event; >>>>> >>>>> This makes a lot of sense as the default and I don't propose changing it >>>>> now, but I do wonder how hard it would be to add support later for >>>>> sorting on (say) the oldest event instead. >>>> [ML] Well, I wrote that sentence because some RDAP events can appear >>>> multiple times. For example, a domain might be locked-unlocked >>>> repeatedly. The purpose of that sentence is just to avoid ambiguities >>>> and implicitly suggest RDAP providers to arrange events with the same >>>> type in descending chronological order. >>>>> The "jsonPath" field in the "sorting_metadata" element is used to >>>>> clarify the RDAP field the sorting property refers to. The mapping >>>>> between the sorting properties and the JSONPaths of the RDAP fields >>>>> is shown below: >>>>> [...] >>>>> name >>>>> >>>>> $.domainSearchResults[*].unicodeName >>>>> >>>>> This seems to ignore the subtlety regarding unicodeName vs ldhName. Is >>>>> there a way it could be expressed in JSONPath? >>>> [ML] If unicodeName and ldhName were alternative, the JSONPath union >>>> operator would fit (i.e. >>>> $.domainSearchResults[*].[unicodeName,ldhName]). Currently, RFC7483 >>>> contains no assumption about when they should/must be present but >>>> examples seem to recommend to present unicodeName only for IDNs. When >>>> both the properties are present, the union operator doesn't fit exactly >>>> and I haven't still found the right JSONPath expression based only on >>>> the basic operators. However, since the "jsonPath" member is only for >>>> documentation, the aforesaid JSONPath expression could be the most >>>> suitable for conveying that sorting is applied on a kind of >>>> <unicodeName, ldhName> combination. >>> I have to defer to your expertise here; thank you for thinking about it. >> [ML] Thanks. Maybe this is the only case where the JSONPath WG outcomes >> might be helpful :-) >>>>> o Nameserver >>>>> >>>>> name >>>>> >>>>> $.domainSearchResults[*].unicodeName >>>>> >>>>> Presumably this is supposed to be nameserverSearchResults? >>>> [ML] Absolutely. It's a cut-and-paste typo :-) >>>>> Section 2.4 >>>>> >>>>> I think we want another introductory paragraph like: >>>>> >>>>> % The cursor parameter is used by the server to preserve information >>>>> % about the pagination state of a given query's results across calls to >>>>> % the search API, so that successive requests by the client can return >>>>> % page N, N+1, N+2, etc. Its value is only required to be interpretable >>>>> % by the server and could be implemented, for example, as an opaque >>>>> % database lookup key. If a server does use a method for generating >>>>> % cursor values that involves internal structure, such as the one >>>>> % described below, the server needs to recognize that the value supplied >>>>> % by a client could have been modified (maliciously), and implement >>>>> % appropriate bounds-checking and similar measures when parsing received >>>>> % values. >>>>> >>>>> The current wording strongly suggests that base64-encoding a meaningful >>>>> value that the client could inspect or even construct is required, and I >>>>> do not think that is very maintainable or what was intended, given the >>>>> current second paragraph ("servers can change the method over time >>>>> without announcing anything to clients"). >>>>> >>>>> (side note) I'm also pretty partial to the way JMAP discusses returning >>>>> (paginated, but non-uniformly) changes to a given data stream, e.g., at >>>>> https://www.rfc-editor.org/rfc/rfc8620.html#section-5.2 -- any given >>>>> state is named, and you can get "stuff starting at <named state>" and >>>>> the name to use for the state as of the current reply. >>>> [ML] Maybe I didn't make myself clear. >>>> >>>> The Base64 encoding is a simple (unrecommended) trasformation to make >>>> the cursor value opaque to the client. It just seemed suitable to me for >>>> being used in some examples.But if you take a loook at the example of >>>> Figure 6, you may note that you can't obtain a meaningful result by >>>> simply Base64-decoding the cursor value. Definitively, the method to >>>> encrypt the cursor value must be more complex than a mere Base64 encoding. >>>> >>>> Regarding the sentence between brackets, it means that servers can >>>> change the underlying pagination strategy without having an impact on >>>> clients. A server can initially implement the offset pagination and then >>>> turns to the keyset pagination but this has no effect on clients' features. >>>> >>>> The same concepts about the checks that servers should make in order to >>>> check the cursor value are reported both in the "Negative Answer" >>>> section and in Appedix C.3. "Paging" >>>> >>>> Anyway, I'll try to integrate your text in the current document and add >>>> a sentence with the purpose of discouraging the use of the >>>> Base64-encoding in the cursor implementations. >>> Thank you; I did not know that you wanted to discourage the use of plain >>> base64 encoding, but that is reassuring to know. >>> (I did notice that the example in Figure 6 did not decode to a meaningful >>> result, but did not make much of a conclusion from that.) >> [ML] OK. I will add some text to clarify that a mere Base64 encoding is >> not recommended to encrypt the cursor value. > Okay, thank you. > >>>>> Section 4 >>>>> >>>>> If the server doesn't have access to an efficient (e.g.) counting >>>>> operation on the backend, would we recommend that the server not support >>>>> sorting/pagination, since there's not much benefit from having the >>>>> server pull up all the results and count them just to be able to return >>>>> the total count value back to the client, and then go do the same work again >>>>> when the client asks for the next page of results? >>>> [ML] In my implementation the RDAP server doesn't present the count >>>> operator in the sorting and paging links. The number of results doesn't >>>> change at all if the result set is sorted by a property rather than >>>> another. The same generally occurs (as I wrote above) if the client is >>>> scrolling the result set pages. So why to repeat the count parameter in >>>> the links? The totalCount value is returned in the response to the >>>> initial query and, as It is no more repeated in the links, the counting >>>> operation is not executed. Therefore, we don't need to make particular >>>> assumptions about the performance of counting operation. >>>>> Section 7 >>>>> >>>>> I suggest noting that (encoded) structured "cursor" values present a new >>>>> attack surface on the server that needs to be protected. >>>> [ML] Sorry, could you futherly explain this concept? AFAIK, it is >>>> possible to protect REST API endpoints but not query parameters. >>> I think this was assuming that the server was going to just base64-encode >>> something like "offset=100,limit=50" -- in that case a client could pass in >>> the base64'd version of "offset=1000000000,limit=50". The server would >>> need to sanity-check the results of base64 decoding and reject the >>> too-large offset. If the server is expecting to do a fancier >>> self-encrypted-token scheme for the cursor, the integrity check associated >>> with the encryption takes care of this protection inherently, and we may >>> not need to mention anything about sanitizing these valuess.. >> [ML] OK. >>>>> results in a response. However, this last security policy can result >>>>> in a higher inefficiency if the RDAP server does not provide any >>>>> functionality to return the truncated results. >>>>> >>>>> I'm not sure I understand (or agree with) this last sentence -- it seems >>>>> that unlateral silent truncation of results by the server leads to not >>>>> just inefficiency but also potential security considerations in its own >>>>> right, with the client not knowing that it has incomplete results. >>>>> Also, if the server is truncating the results, by definition it "has >>>>> functionality to return the truncated results" -- that's what it's >>>>> doing! So I assume the intent was to say something about negotiating or >>>>> indicating that the results are truncated, not actually doing the >>>>> truncation. >>>> [ML] I think that servers legitimately truncate the result sets to >>>> mitigate the risk of resource exhaustion and consequent denial of >>>> service. The implementation of the capablities described in this >>>> document makes servers to keep on managing sustainable result sets and, >>>> at the same time, increases clients'possibility to avoid truncation and >>>> find relevant results. >>> I agree with the paragraph you just wrote. However, I think that the state >>> of affairs prior to this document, with unilateral truncation by the >>> server, can lead not just to "inefficiency" but also to security risks. So >>> I was hoping to see something like "can result in higher inefficiency or >>> risk due to acting on incomplete information". >> [ML] OK. I will change the sentence as you suggest. >>> My second point ("Also, if the server [...]") was intending to suggest that >>> the last sentence say something like "if the RDAP Server does not provide >>> any functionality to return sorted results or iterate through the full >>> result set". >> [ML] You are right. I mispelled the sentence. Is it fine for you if I >> change the sentence as in the follwing? >> >> OLD >> >> if the RDAP server does not provide any >> functionality to return the truncated results >> >> NEW >> >> if the RDAP server does not provide any >> functionality to return results removed by truncation > Yes, perfect! > >>>>> The new parameters presented in this document provide RDAP operators >>>>> with a way to implement a server that reduces inefficiency risks. >>>>> >>>>> [same question about "inefficiency" being the right word] >>>> [ML] Maybe I can replace the phrase "that reduces inefficiency risks." >>>> with the phrase "that reduces the risk of resource exhaustion and >>>> consequent denial of service". >>>> >>>> Are you ok with it? >>> Denial of service is only one of the risks I have in mind; another is that >>> if a server silently truncates, a client will have incomplete data and >>> might derive a conclusion ("domain X does not satisfy property Y") that is >>> fal >> [ML] OK. As I wrote above, I'll change the sentence to outline the risk >> due to acting on incomplete information. >>> se. >>> >>>>> Appendix B >>>>> >>>>> o It does not allow direct navigation to arbitrary pages because the >>>>> result set must be scrolled in sequential order starting from the >>>>> initial page; >>>>> >>>>> (side note) I didn't follow the references, so maybe this was covered >>>>> there, but I don't quite follow why direct navigation is impossible. If >>>>> you use a key field for seeking, can't you just start in the middle from >>>>> some known value for that key field? >>>> [ML] Especially when you know the total counf of a result set, you can >>>> directly jump to a specific point in the result set through offset >>>> pagination but you can't do the same through keyset pagination because >>>> you don't know the key value at that point in advance. One can wonder: >>>> what jumping in the result set is use for? Well, for example, if you are >>>> looking for a specific item in a ordered collection of items, you could >>>> find it through the quicksort algorithm. >>> I don't want to press this topic very much, so let me just try a brief >>> example. Suppose I have a sorted set of ASCII strings, and I want to see if the >>> string "koala" is present. I could start at the beginning and look at each >>> one in turn until I get to something that sorts after "koala", or I could >>> ask the database "give me the first thing you have that is after "k", which >>> gets me some of the say there. Is the problem that you need an actual >>> value in the dataset to start from, and since "k" isn't guaranteed to be in >>> the set you are forced to start from the beginning? >> [ML] Offset pagination is based on the positions of the results within >> the result set while keyset pagination is based on a unique combination >> of values of the results. You can always skip the first K results of a >> result set by specifying offset=K but you can't do the same through >> keyset pagination because you don't know what is the combination of >> values placed at position K. >> >> Let me give you an example that can clarify. >> >> Let's suppose that the query "k*" returns N=1000 results and the length >> of result page is 100. The fastest method to find if "koala" is present >> is to jump to the middle (i.e. offset=500) of the result set and look >> the first result. If it is lexicographically lower than "koala" then >> "koala" might be in second half, on the contrary, "koala" might be in >> the first half. Let's suppose that the first item is greater than >> "koala", then you can jump to the middle of the first half (offset=250) >> and apply the above dichotomy in turn until you can find if "koala" is >> present or not. This process takes log2(N) steps maximum. >> >> You can't do the same through keyset pagination because you don't know >> how the values are distributed in the result set. You can only scroll >> the result set from the first page to the last and this process takes N >> steps maximum. >> >> However, in general, one is interested in all the results (or in a >> subset) returned by a query rather than a single result. In this case, >> provided that the results can always be sorted according to a unique >> index, keyset pagination is more efficient than offset pagination. Let's >> take the afore example. By offset pagination, the underlying DBMS >> selects always 1000 results and then returns the current page so this >> means that, in the worst case, the DBMS will select 1000 results for 10 >> times (from offset=0 to offset=900). By keyset pagination the number of >> results the underlying DBMS selects decreases by 100 results each step. >> At the beginning, it selects 1000 results and returns the first page, >> then it selects 900 results and returns the second page. >> >> Note also two facts: >> >> - the time needed to scroll the result set could be significant when the >> result set is huge >> >> - for all the RDAP searchable objects, it is always possible to build >> more or less easily a combination of properties acting as unique index >> and this is true regardless of whether the search includes the sort >> parameter or not. For example, for the entity object class "handle" is a >> unique property but if the search includes "sort=registrationDate", the >> combination <registrationDate, handle> is unique. > Thank you for the additional explanation; I don't think we need to spend > more time on this topic. > > I'm looking forward to the -18! > > Thanks again, > > Ben > >>>>> Appendix C.2 >>>>> >>>>> total count. Therefore, as "totalCount" is an optional response >>>>> information, fetching always the total number of rows has been >>>>> >>>>> I'm not entirely sure in what sense "optional response information" is >>>>> intended -- my reading of Section 2.1 is that it's mandatory to return >>>>> totalCount if the client included the 'count' query parameter. >>>> [ML] Exactly but it isn't returned always. For this reason, it is an >>>> optional member of the paging_metadata object. >>> Okay. (I think your reply to another ballot comment also helped clarify >>> this for me.) >> [ML] Good. >>>> Looking forward for your reply to my questions/comments. >>> Thanks a lot for the explanations and updates; hopefully I have clarified >>> anything that was unclear. >>> >>> -Ben >>> >>> _______________________________________________ >>> regext mailing list >>> regext@ietf.org >>> https://www.ietf.org/mailman/listinfo/regext >> -- >> Dr. Mario Loffredo >> Systems and Technological Development Unit >> Institute of Informatics and Telematics (IIT) >> National Research Council (CNR) >> via G. Moruzzi 1, I-56124 PISA, Italy >> Phone: +39.0503153497 >> Mobile: +39.3462122240 >> Web: http://www.iit.cnr.it/mario.loffredo >> > _______________________________________________ > regext mailing list > regext@ietf.org > https://www.ietf.org/mailman/listinfo/regext -- Dr. Mario Loffredo Systems and Technological Development Unit Institute of Informatics and Telematics (IIT) National Research Council (CNR) via G. Moruzzi 1, I-56124 PISA, Italy Phone: +39.0503153497 Mobile: +39.3462122240 Web: http://www.iit.cnr.it/mario.loffredo
- [regext] Benjamin Kaduk's Discuss on draft-ietf-r… Benjamin Kaduk via Datatracker
- Re: [regext] Benjamin Kaduk's Discuss on draft-ie… Mario Loffredo
- Re: [regext] Benjamin Kaduk's Discuss on draft-ie… Benjamin Kaduk
- Re: [regext] Benjamin Kaduk's Discuss on draft-ie… Mario Loffredo
- Re: [regext] Benjamin Kaduk's Discuss on draft-ie… Benjamin Kaduk
- Re: [regext] Benjamin Kaduk's Discuss on draft-ie… Mario Loffredo