Re: [core] "ni" URIs in SenML names (and no "; " in allowed characters)

Cullen Jennings <fluffy@iii.ca> Fri, 28 July 2017 21:04 UTC

Return-Path: <fluffy@iii.ca>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 625D51270B4 for <core@ietfa.amsl.com>; Fri, 28 Jul 2017 14:04:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LOjULuUD6k-t for <core@ietfa.amsl.com>; Fri, 28 Jul 2017 14:04:27 -0700 (PDT)
Received: from smtp93.ord1d.emailsrvr.com (smtp93.ord1d.emailsrvr.com [184.106.54.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BD2D11243F6 for <core@ietf.org>; Fri, 28 Jul 2017 14:04:27 -0700 (PDT)
Received: from smtp12.relay.ord1d.emailsrvr.com (localhost [127.0.0.1]) by smtp12.relay.ord1d.emailsrvr.com (SMTP Server) with ESMTP id 12B08E0088; Fri, 28 Jul 2017 17:04:27 -0400 (EDT)
X-Auth-ID: fluffy@iii.ca
Received: by smtp12.relay.ord1d.emailsrvr.com (Authenticated sender: fluffy-AT-iii.ca) with ESMTPSA id A4444E006E; Fri, 28 Jul 2017 17:04:26 -0400 (EDT)
X-Sender-Id: fluffy@iii.ca
Received: from [10.1.3.55] (d172-219-247-164.abhsia.telus.net [172.219.247.164]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:25 (trex/5.7.12); Fri, 28 Jul 2017 17:04:27 -0400
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
From: Cullen Jennings <fluffy@iii.ca>
In-Reply-To: <3D8076B6-615F-4A86-89A4-539DCBC1BBB2@ericsson.com>
Date: Fri, 28 Jul 2017 15:04:25 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <8BF7198C-8D5D-42DA-8AC4-9295C753689B@iii.ca>
References: <988B5CDC-8709-4BF1-AB6F-C5B16D45E563@ericsson.com> <0fdb5c19-5edb-01e2-d2cc-776f285f6770@filament.com> <3D8076B6-615F-4A86-89A4-539DCBC1BBB2@ericsson.com>
To: core <core@ietf.org>
X-Mailer: Apple Mail (2.3259)
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/O8ci-tgsHnDuMVWnqy0KR0Wh8aA>
Subject: Re: [core] "ni" URIs in SenML names (and no "; " in allowed characters)
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Jul 2017 21:04:30 -0000

TL;DR

What I suggest we do in this case is add text to the draft which says something along lines of:

When encoding a "Named Information" URI into a SenML name, it is RECOMMENDED that the ';' between the digest algorithm and digest value be converted to a '.' 

------------------------------------

If you want the longer version ..... 

People in the core WG often think about what SenML means to implement on the constrained device side and that is important ... but equally important is the servers that can receive data from millions of devices and process it in a cost effective way. Right now they check the data once when it comes in to ensure it meets the required syntax then on all the internal systems, they do not have to constantly check if the data is safe for the things it is used for. One of the big concerns is database attacks best is summarized by https://xkcd.com/327/ If the database does not allow one to safely use character X and character X is allowed in a senml name, then the needs to be escaped everywhere it is used. It is preferable to have a very reduced set of charter so they don't have to be escaped. 

Similarly, senml name are often used in query parameters in a HTTP URL. 

It is impossible to define a syntax for names that can both be used as a URL and can be used as a segment of a URL with no escaping. Early on we choose to have the SenML names be such that we could use them as parts or URLs. This means we end up needing some amount of encoding or escaping when we try and fit a full URL into a name. 

Changing to allow "/" was a huge change at this late stage but no one could com up with a concrete security vulnerability it intruded to existing systems. However, introducing ";" would definitely introduce security venerability to running code so at the same time that the input check was removed to allow ";", every place that used the name would have to check and or escape the use of ";" to make it safe. This slows down the code raising costs and introduces more bugs and security vulnerabilities. 

Every version of the draft since https://tools.ietf.org/html/draft-jennings-senml-00 published in June 2010 has had the roughly the word

   This restricted character set was chosen so that
   these names can be directly used as in other types of URI including
   segments of an HTTP path with no special encoding and can
   be directly used in many databases and analytic systems.

The most resent version added "/" which is likely to be highly inoperable given it was added in 2017. But I do not think we can ask people with high speed deployed server software to add ";" at this point. The gains of adding this do not offset the spend and security concerns of just using simple names that do not require escaping. 

To get to the specific of the NI URI and the ";" that is causing problems ... the text after it is constrained o the base64 URL alphabet (which is fine in name) and the text before is the hash algorithm and will mostly likely be letters, number and minus sign. I think using the "." instead of a ";" would be totally safe from a parsing point of view here in that anything parsing this that needed to fine the end of the algorithm and start of base64 hash could safely assume that the "." was the separator of the two. Alternatively just the hash could be used with for the name with the algorithm carried in meta data about the sensor.  I agree with peters point about avoid bespoke encoding rules but given the lack of clear use case for using NI names here at all, perhaps it might be better to ask even if using hash based names, why use a URL syntax for them. 

Anyways, we can define names such that they can be use in URI with no escaping, or they can be URI with no escaping, but we can't get both at the same time. We decide to allow them to used in URI and I think that is the right decision. 


PS ... Sorry about the top posting... some guy called Peter once told me 


:)

A: Because it messes up the order in which people normally read text.

Q: Why is top-posting so bad?

A: Top-posting.

Q: What is the most annoying behavior on email discussion lists?


> On Jul 24, 2017, at 2:19 AM, Ari Keränen <ari.keranen@ericsson.com> wrote:
> 
> 
>> On 22 Jul 2017, at 20.53, Peter Saint-Andre - Filament <peter@filament.com> wrote:
>> 
>> On 7/22/17 7:34 AM, Ari Keränen wrote:
>>> Hi all,
>>> 
>>> In the Friday CoRE session we discussed that adding ";" to the
>>> allowed characters of SenML names could be useful for accommodating
>>> for certain kinds of URIs, like "ni". However, as I mentioned during
>>> the meeting, whether the ";" character is safe for names needs to be
>>> checked with experts on the topic. We got now feedback that it is
>>> *not* safe to use that character and should not be included in the
>>> character set.
>> 
>> For the edification of working group participants, can you elaborate on
>> what the safety issue is? For instance, does using ';' introduce a
>> security vulnerability in some constrained systems?
> 
> Apparently it's more of a problem for the back-end systems, but I'll let Cullen elaborate on the details since he raised the concerns on this.
> 
>>> Therefore, instead of allowing ";" in names, we could consider
>>> documenting simple translation rule for "ni" URIs: just switch the
>>> ";" into ":". In the ni-URIs the ";" character is used just once:
>>> between base64 encoded hash and the algorithm used to generate the
>>> hash value [1]. And to avoid issues with encoding the authority part,
>>> we could use just the alg-val part of the URI as name.
>> 
>> I'm sure we'd all like to avoid yet more bespoke encoding rules.
> 
> That would be nice. In case of "ni" URIs I don't think the translation rule is a big issue, but in general the restricted character set does restrict the usability of URIs as names. However, having such capability is not one of the design goals of SenML, but just the other way around: being able to use names *in* URIs. 
> 
>>> This could be done in the SenML base spec, or we could leave all such
>>> translation / name-mapping rules for future spec(s) since they are
>>> just a special case for name generation and not a requirement for
>>> SenML as such. I'm perhaps leaning towards the latter option to get
>>> SenML shipped now.
>> 
>> Leaving this out of SenML for now makes sense, but (at the risk of
>> sounding like a broken record) I'll reiterate that we need to make it
>> very clear that many URI schemes and URN namespaces can't be used as
>> SenML names because of the restricted character set.
> 
> Yes, I'm planning to incorporate the text you proposed for the updated revision to address this. All, please comment if you think that's a bad idea. It's not a technical change, just clarification of existing rules.
> 
> 
> Cheers,
> Ari
>