Re: [urn] Comments on PWID -05 - now PWID -06

Eld Zierau <elzi@kb.dk> Tue, 30 April 2019 10:22 UTC

Return-Path: <elzi@kb.dk>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60EDD12028A for <urn@ietfa.amsl.com>; Tue, 30 Apr 2019 03:22:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GRjDMblB72bC for <urn@ietfa.amsl.com>; Tue, 30 Apr 2019 03:22:10 -0700 (PDT)
Received: from smtp-out12.electric.net (smtp-out12.electric.net [89.104.206.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2595C1200E3 for <urn@ietf.org>; Tue, 30 Apr 2019 03:22:09 -0700 (PDT)
Received: from 1hLPtm-0001kc-WF by out12d.electric.net with emc1-ok (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hLPtn-0001n6-U2 for urn@ietf.org; Tue, 30 Apr 2019 03:22:07 -0700
Received: by emcmailer; Tue, 30 Apr 2019 03:22:07 -0700
Received: from [92.43.124.147] (helo=deliveryscan.hostedsepo.dk) by out12d.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hLPtm-0001kc-WF for urn@ietf.org; Tue, 30 Apr 2019 03:22:07 -0700
Received: from localhost (unknown [10.72.17.201]) by deliveryscan.hostedsepo.dk (Postfix) with ESMTP id DCAAB1249 for <urn@ietf.org>; Tue, 30 Apr 2019 12:22:06 +0200 (CEST)
Received: from 10.72.17.201 ([10.72.17.201]) by dispatch-outgoing.hostedsepo.dk (JAMES SMTP Server 2.3.2-1) with SMTP ID 474 for <urn@ietf.org>; Tue, 30 Apr 2019 12:22:05 +0200 (CEST)
Received: from out12b.electric.net (smtp-out12.electric.net [89.104.206.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "electric.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by pf1.outpostscan-mta.hostedsepo.dk (Postfix) with ESMTPS id A843C9F375 for <urn@ietf.org>; Tue, 30 Apr 2019 12:22:06 +0200 (CEST)
Received: from 1hLPtm-0006I3-Tq by out12b.electric.net with hostsite:2468467 (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hLPtm-0006K4-Uu for urn@ietf.org; Tue, 30 Apr 2019 03:22:06 -0700
Received: by emcmailer; Tue, 30 Apr 2019 03:22:06 -0700
Received: from [92.43.124.46] (helo=pf1.outprescan-mta.hostedsepo.dk) by out12b.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from <elzi@kb.dk>) id 1hLPtm-0006I3-Tq for urn@ietf.org; Tue, 30 Apr 2019 03:22:06 -0700
Received: from post.kb.dk (post-03.kb.dk [130.226.226.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by pf1.outprescan-mta.hostedsepo.dk (Postfix) with ESMTPS id 262019F375 for <urn@ietf.org>; Tue, 30 Apr 2019 12:22:05 +0200 (CEST)
Received: from EXCH-01.kb.dk (exch-01.kb.dk [10.5.0.111]) by post.kb.dk (Postfix) with ESMTPS id D23D190B02 for <urn@ietf.org>; Tue, 30 Apr 2019 12:22:05 +0200 (CEST)
Received: from EXCH-02.kb.dk (10.5.0.112) by EXCH-01.kb.dk (10.5.0.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 30 Apr 2019 12:22:05 +0200
Received: from EXCH-02.kb.dk ([fe80::b595:1a1f:5666:b29]) by EXCH-02.kb.dk ([fe80::b595:1a1f:5666:b29%7]) with mapi id 15.01.1713.004; Tue, 30 Apr 2019 12:22:05 +0200
From: Eld Zierau <elzi@kb.dk>
To: "urn@ietf.org" <urn@ietf.org>
Thread-Topic: [urn] Comments on PWID -05 - now PWID -06
Thread-Index: AdTQKXi5cerx0k6zRXuaa/w3TPMeTguTXB/AAB7BlQAAExCmsA==
Date: Tue, 30 Apr 2019 10:22:05 +0000
Message-ID: <e22d64c5d5164f0b8eef78dd391440f8@kb.dk>
References: <2870fa7971294156b2e2ad240c9584c3@kb.dk> <9dcb0c25-a48e-e206-da26-a588f07d3dce@stpeter.im>
In-Reply-To: <9dcb0c25-a48e-e206-da26-a588f07d3dce@stpeter.im>
Accept-Language: da-DK, en-US
Content-Language: da-DK
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [130.226.229.95]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Outbound-IP: 92.43.124.46
X-Env-From: elzi@kb.dk
X-Proto: esmtps
X-Revdns: outprescan-mta.hostedsepo.dk
X-HELO: pf1.outprescan-mta.hostedsepo.dk
X-TLS: TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256
X-Authenticated_ID:
X-PolicySMART: 10573177, 19718497
X-Virus-Status: Scanned by VirusSMART (c)
X-Virus-Status: Scanned by VirusSMART (s)
X-Outbound-IP: 92.43.124.147
X-Env-From: elzi@kb.dk
X-Proto: esmtps
X-Revdns: deliveryscan.hostedsepo.dk
X-HELO: deliveryscan.hostedsepo.dk
X-TLS: TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256
X-Authenticated_ID:
X-Virus-Status: Scanned by VirusSMART (c)
X-Virus-Status: Scanned by VirusSMART (s)
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/MlRvqV_RiS5B--p-b08NpnOd7dg>
Subject: Re: [urn] Comments on PWID -05 - now PWID -06
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Apr 2019 10:22:15 -0000

Thank you all

I will make a new draft soon

Best regards, Eld

-----Original Message-----
From: Peter Saint-Andre <stpeter@stpeter.im> 
Sent: Tuesday, April 30, 2019 5:14 AM
To: Eld Zierau <elzi@kb.dk>
Cc: urn@ietf.org
Subject: Re: [urn] Comments on PWID -05 - now PWID -06

Hello Eld,

Your proposed syntax (with "~") looks fine to me.

The ABNF definition of your proposed syntax does not conform to RFC 5234. You can check the ABNF using this tool:

https://tools.ietf.org/tools/bap/abnf.cgi

In particular, it's not clear to me what a rule like this is intended to
mean:

   registered-archive-id = +( unreserved )

Do you mean that a registered-archive-id can include one or more instances of characters from the `unreserved` rule? If so, change "+" to "*".

To simplify the ABNF, you could use the datetime rules from RFC 3339.

Please don't use `URI` as the name of an ABNF rule because that's already defined in RFC 3986 and could cause confusion. Perhaps call it `uri-string`.

Personally I found the `precision-spec` categories difficult to understand and sometimes ambiguous. For instance:

* A precision level of "part" seems to be an HTML file only (at least in the case when "it refers to an html web element"), however a URI can point to many file types other than HTML files. Perhaps "single" (as in a single file) would be clearer; it would also be good to specify how this is handled in the case of file types other than HTML.

* Does a precision level of "page" apply only to HTML pages with all "referenced web parts"? (By the latter term I think you mean what the HTML 5.2 specification defines as "embedded content"; in general it would be good to align terminology.)

As to the registration, instead of version 6 it should be version 1 because this is the initial registration (i.e., whenever we are finished with this process it will be the initial version, whereas if you update the entire registration in the future that would be version 2).

The security considerations strike me as underspecified. An archived web page or part could be just as dangerous as a "live" page or part; for instance, it could include insecure scripts, malware, trackers, etc.
Furthermore, an archived page could in fact be more dangerous, because it could include outdated scripts with known vulnerabilities that can never be patched because the script is archived for all time in a vulnerable state (an attack of this sort was recently discovered in the wild).

Best Regards,

Peter

On 4/29/19 6:10 AM, Eld Zierau wrote:
> Did any of you have comments to my previous mail?
> Is there any action you want me to take in order to get it accepted?
> Best Regards, Eld
> 
> -----Original Message-----
> From: Eld Zierau
> Sent: Friday, March 1, 2019 1:29 PM
> To: 'Martin J. Dürst' <duerst@it.aoyama.ac.jp>; 'Dale R. Worley' 
> <worley@ariadne.com>
> Cc: 'urn@ietf.org' <urn@ietf.org>; 'L.Svensson@dnb.de' 
> <L.Svensson@dnb.de>
> Subject: [urn] Comments on PWID -05 - now PWID -06
> 
> I have now uploade a new version: draft-pwid-urn-specification-06
>  - and thanks again for comments and suggestions
> 
> Regarding the suggestion from Martin (included below), I can as a computer scientist certainly see the reasoning as quite obvious. However, my experience with presentation of the PWID is that syntax based on computational reasoning is something that users find illogically, e.g. that the archived-item-id (usually URI) is included in the end of the PWID. I believe that adding a "~" for identifiers that are registered separately is acceptable for such users, but I am also convinced that a "+" before a domain will be something that confuses (non-computer science) users a lot. 
> Also, as said in my previous mail, it is highly unlikely that there will ever be a case where "~" is the first character in a domain for a web archive. Therefore, it seems that it should not be necessary. 
> A minor extra thing is that all existing PWIDs (and tools providing and resolving PWIDs) would not comply, which they would otherwise (none of these use registered identifiers yet only domains and URIs).
> In other words: I will be very sorry to add a "+" to domains, and I believe it is not necessary.
> 
> The uploaded version  does not include a "+" to domains, - If 
> required, I will of course add it (although sorry to do so)
> 
> Please let me know if it acceptable, and I will act accordingly.
> 
> Best regards, Eld
> 
> 
> On 2019/03/01 11:31, Dale R. Worley wrote:
>> Martin J. Duerst <duerst@it.aoyama.ac.jp> writes:
>>>> [...]  E.g., one could require that any archive-id that is not 
>>>> intended to be interpreted as a DNS name to start with one of "-", 
>>>> ".", "_", "~".
>>>
>>> I haven't looked into the details, but in general, I think this is a 
>>> bad idea. It is much better to have an explicit distinction than to 
>>> rely on some syntax restrictions. Such syntax restrictions may or 
>>> may not actually hold in practice. It's very easy to create a DNS 
>>> name starting with '-' or '_', for example, even though officially, that's not allowed.
>>
>> I may agree with you ... But what do you mean by "an explicit 
>> distinction"?  E.g., I would tend to consider "archive-ids starting 
>> with '~' are registered archive names, and archive-ids that do not 
>> are considered DNS names" to be an "explicit" distinction, but you 
>> mean something else.
> 
> Well, the explicit distinction would be "if it starts with '~', what follows is a registered archive name, and if it starts with '+', what follows is a DNS name" or some such. This would not exclude any leading characters in either archive names or DNS names.
> 
> Regards,   Martin.
> 
>> Or maybe the right question is, What do you propose as an alternative?
> _______________________________________________
> urn mailing list
> urn@ietf.org
> https://www.ietf.org/mailman/listinfo/urn
>