Re: [urn] new urn PWID draft (7) with corrections

Eld Zierau <elzi@kb.dk> Tue, 04 June 2019 08:11 UTC

Return-Path: <elzi@kb.dk>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 38CE61200F4 for <urn@ietfa.amsl.com>; Tue, 4 Jun 2019 01:11:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nCABCrvgP_Fc for <urn@ietfa.amsl.com>; Tue, 4 Jun 2019 01:11:38 -0700 (PDT)
Received: from smtp-out12.electric.net (smtp-out12.electric.net [89.104.206.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 02C3012006A for <urn@ietf.org>; Tue, 4 Jun 2019 01:11:37 -0700 (PDT)
Received: from 1hY4Xe-0004Nq-WB by out12c.electric.net with emc1-ok (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hY4Xf-0004PX-Tk for urn@ietf.org; Tue, 04 Jun 2019 01:11:35 -0700
Received: by emcmailer; Tue, 04 Jun 2019 01:11:35 -0700
Received: from [92.43.124.147] (helo=deliveryscan.hostedsepo.dk) by out12c.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hY4Xe-0004Nq-WB for urn@ietf.org; Tue, 04 Jun 2019 01:11:34 -0700
Received: from localhost (unknown [10.72.17.201]) by deliveryscan.hostedsepo.dk (Postfix) with ESMTP id D73071078 for <urn@ietf.org>; Tue, 4 Jun 2019 10:11:34 +0200 (CEST)
Received: from 10.72.17.201 ([10.72.17.201]) by dispatch-outgoing.hostedsepo.dk (JAMES SMTP Server 2.3.2-1) with SMTP ID 903 for <urn@ietf.org>; Tue, 4 Jun 2019 10:11:39 +0200 (CEST)
Received: from out12b.electric.net (smtp-out12.electric.net [89.104.206.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "electric.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by outgoing-postscan.hostedsepo.dk (Postfix) with ESMTPS id 8F905C95 for <urn@ietf.org>; Tue, 4 Jun 2019 10:11:34 +0200 (CEST)
Received: from 1hY4XS-0006TA-VY by out12b.electric.net with hostsite:2468467 (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hY4Xe-0007gX-UX for urn@ietf.org; Tue, 04 Jun 2019 01:11:34 -0700
Received: by emcmailer; Tue, 04 Jun 2019 01:11:34 -0700
Received: from [92.43.124.46] (helo=pf1.outprescan-mta.hostedsepo.dk) by out12b.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hY4XS-0006TA-VY for urn@ietf.org; Tue, 04 Jun 2019 01:11:22 -0700
Received: from post.kb.dk (post-03.kb.dk [130.226.226.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by pf1.outprescan-mta.hostedsepo.dk (Postfix) with ESMTPS id 9C6889F3CA for <urn@ietf.org>; Tue, 4 Jun 2019 10:11:22 +0200 (CEST)
Received: from EXCH-01.kb.dk (exch-01.kb.dk [10.5.0.111]) by post.kb.dk (Postfix) with ESMTPS id 5AF3A91EEF for <urn@ietf.org>; Tue, 4 Jun 2019 10:11:22 +0200 (CEST)
Received: from EXCH-02.kb.dk (10.5.0.112) by EXCH-01.kb.dk (10.5.0.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 4 Jun 2019 10:11:22 +0200
Received: from EXCH-02.kb.dk ([fe80::b595:1a1f:5666:b29]) by EXCH-02.kb.dk ([fe80::b595:1a1f:5666:b29%7]) with mapi id 15.01.1713.004; Tue, 4 Jun 2019 10:11:22 +0200
From: Eld Zierau <elzi@kb.dk>
To: "urn@ietf.org" <urn@ietf.org>
Thread-Topic: new urn PWID draft (7) with corrections
Thread-Index: AdUA5eI8i0YmDbN8SwiZDxbA6TaDHAZw4AkQ
Date: Tue, 04 Jun 2019 08:11:21 +0000
Message-ID: <8b275ab6aa8d48359113497209261c63@kb.dk>
Accept-Language: da-DK, en-US
Content-Language: da-DK
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [130.226.229.95]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Outbound-IP: 92.43.124.46
X-Env-From: elzi@kb.dk
X-Proto: esmtps
X-Revdns: outprescan-mta.hostedsepo.dk
X-HELO: pf1.outprescan-mta.hostedsepo.dk
X-TLS: TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256
X-Authenticated_ID:
X-PolicySMART: 10573177, 19718497
X-Virus-Status: Scanned by VirusSMART (b)
X-Virus-Status: Scanned by VirusSMART (c)
X-Outbound-IP: 92.43.124.147
X-Env-From: elzi@kb.dk
X-Proto: esmtps
X-Revdns: deliveryscan.hostedsepo.dk
X-HELO: deliveryscan.hostedsepo.dk
X-TLS: TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256
X-Authenticated_ID:
X-Virus-Status: Scanned by VirusSMART (b)
X-Virus-Status: Scanned by VirusSMART (c)
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/yyKd4qUQUIX8bQ55ZHWrlQTEbT0>
Subject: Re: [urn] new urn PWID draft (7) with corrections
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Jun 2019 08:11:42 -0000

I just submitted a version with a minor correction in one of the references (had the wrong title due to a copy/paste error)
Can it be accepted as it is now?
Best regards, Eld

-----Original Message-----
From: Eld Zierau 
Sent: Thursday, May 2, 2019 2:56 PM
To: 'Peter Saint-Andre' <stpeter@stpeter.im>; urn@ietf.org
Subject: new urn PWID draft (7) with corrections

Thanks again for your comments
I have uploaded a draft version 7 - and described how I have addressed the comments in the below mail from Peter Does this cover what is needed?

Best regards, Eld

-----Original Message-----
From: Peter Saint-Andre <stpeter@stpeter.im>
Sent: Tuesday, April 30, 2019 5:14 AM
To: Eld Zierau <elzi@kb.dk>
Cc: urn@ietf.org
Subject: Re: [urn] Comments on PWID -05 - now PWID -06

Hello Eld,

Your proposed syntax (with "~") looks fine to me.
> Eld: :)

The ABNF definition of your proposed syntax does not conform to RFC 5234. You can check the ABNF using this tool:

https://tools.ietf.org/tools/bap/abnf.cgi

> Eld: it conforms now - thank you so much for providing this the link 
> to the syntax checker - that was very helpful

In particular, it's not clear to me what a rule like this is intended to
mean:

   registered-archive-id = +( unreserved )

Do you mean that a registered-archive-id can include one or more instances of characters from the `unreserved` rule? If so, change "+" to "*".

> Eld: I meant with one or more characters - but I found out it should 
> then be 1*unreserved and likewise for other occurrences


To simplify the ABNF, you could use the datetime rules from RFC 3339.

> Eld: I used to in an earlier version, but Dale noticed that there was a difference (in mail on 28th of February 2019): "But comparing that to W3CDTF, I see no single nontermainal which corresponds to the set of formats allowed in W3CDTF.  I suggest you make a more rigid specification as to what is allwed for archival-time." - so I think I better stick to the rigid version in order to be sure.

Please don't use `URI` as the name of an ABNF rule because that's already defined in RFC 3986 and could cause confusion. Perhaps call it `uri-string`.

>Eld: Done

Personally I found the `precision-spec` categories difficult to understand and sometimes ambiguous. For instance:

* A precision level of "part" seems to be an HTML file only (at least in the case when "it refers to an html web element"), however a URI can point to many file types other than HTML files. Perhaps "single" (as in a single file) would be clearer; it would also be good to specify how this is handled in the case of file types other than HTML.

* Does a precision level of "page" apply only to HTML pages with all "referenced web parts"? (By the latter term I think you mean what the HTML 5.2 specification defines as "embedded content"; in general it would be good to align terminology.)

>Eld: I have rephrased to make it more clear - it was explained in two 
>steps before, - I have therefore also restructured a bit to make it 
>more clear (page 11-13)

As to the registration, instead of version 6 it should be version 1 because this is the initial registration (i.e., whenever we are finished with this process it will be the initial version, whereas if you update the entire registration in the future that would be version 2).

>Eld: got it - I change it and left details to change log comment Eld? I 
>have also change the version in the top of the template - since I guess that is the same thing - is that correct?

The security considerations strike me as underspecified. An archived web page or part could be just as dangerous as a "live" page or part; for instance, it could include insecure scripts, malware, trackers, etc.
Furthermore, an archived page could in fact be more dangerous, because it could include outdated scripts with known vulnerabilities that can never be patched because the script is archived for all time in a vulnerable state (an attack of this sort was recently discovered in the wild).

>Eld: You are quite right, - I have taken the liberty to rephrase you 
>comment and add it to the section, - hope that is ok

Best Regards,

Peter

On 4/29/19 6:10 AM, Eld Zierau wrote:
> Did any of you have comments to my previous mail?
> Is there any action you want me to take in order to get it accepted?
> Best Regards, Eld
> 
> -----Original Message-----
> From: Eld Zierau
> Sent: Friday, March 1, 2019 1:29 PM
> To: 'Martin J. Dürst' <duerst@it.aoyama.ac.jp>; 'Dale R. Worley' 
> <worley@ariadne.com>
> Cc: 'urn@ietf.org' <urn@ietf.org>; 'L.Svensson@dnb.de' 
> <L.Svensson@dnb.de>
> Subject: [urn] Comments on PWID -05 - now PWID -06
> 
> I have now uploade a new version: draft-pwid-urn-specification-06
>  - and thanks again for comments and suggestions
> 
> Regarding the suggestion from Martin (included below), I can as a computer scientist certainly see the reasoning as quite obvious. However, my experience with presentation of the PWID is that syntax based on computational reasoning is something that users find illogically, e.g. that the archived-item-id (usually URI) is included in the end of the PWID. I believe that adding a "~" for identifiers that are registered separately is acceptable for such users, but I am also convinced that a "+" before a domain will be something that confuses (non-computer science) users a lot. 
> Also, as said in my previous mail, it is highly unlikely that there will ever be a case where "~" is the first character in a domain for a web archive. Therefore, it seems that it should not be necessary. 
> A minor extra thing is that all existing PWIDs (and tools providing and resolving PWIDs) would not comply, which they would otherwise (none of these use registered identifiers yet only domains and URIs).
> In other words: I will be very sorry to add a "+" to domains, and I believe it is not necessary.
> 
> The uploaded version  does not include a "+" to domains, - If 
> required, I will of course add it (although sorry to do so)
> 
> Please let me know if it acceptable, and I will act accordingly.
> 
> Best regards, Eld
> 
> 
> On 2019/03/01 11:31, Dale R. Worley wrote:
>> Martin J. Duerst <duerst@it.aoyama.ac.jp> writes:
>>>> [...]  E.g., one could require that any archive-id that is not 
>>>> intended to be interpreted as a DNS name to start with one of "-", 
>>>> ".", "_", "~".
>>>
>>> I haven't looked into the details, but in general, I think this is a 
>>> bad idea. It is much better to have an explicit distinction than to 
>>> rely on some syntax restrictions. Such syntax restrictions may or 
>>> may not actually hold in practice. It's very easy to create a DNS 
>>> name starting with '-' or '_', for example, even though officially, that's not allowed.
>>
>> I may agree with you ... But what do you mean by "an explicit 
>> distinction"?  E.g., I would tend to consider "archive-ids starting 
>> with '~' are registered archive names, and archive-ids that do not 
>> are considered DNS names" to be an "explicit" distinction, but you 
>> mean something else.
> 
> Well, the explicit distinction would be "if it starts with '~', what follows is a registered archive name, and if it starts with '+', what follows is a DNS name" or some such. This would not exclude any leading characters in either archive names or DNS names.
> 
> Regards,   Martin.
> 
>> Or maybe the right question is, What do you propose as an alternative?
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn
>