[Jmap] Re: [art] Artart telechat review of draft-ietf-jmap-contacts-09
Steffen Nurpmeso <steffen@sdaoden.eu> Mon, 20 May 2024 20:12 UTC
Return-Path: <steffen@sdaoden.eu>
X-Original-To: jmap@ietfa.amsl.com
Delivered-To: jmap@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6F2D5C151088; Mon, 20 May 2024 13:12:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=sdaoden.eu header.b="LgVcCmle"; dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=sdaoden.eu header.b="+dFVZd15"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nppvrfJ12a21; Mon, 20 May 2024 13:12:15 -0700 (PDT)
Received: from sdaoden.eu (sdaoden.eu [217.144.132.164]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0D842C14F726; Mon, 20 May 2024 13:12:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sdaoden.eu; s=citron; t=1716235932; x=1716902598; h=date:author:from:to:cc:subject: message-id:in-reply-to:references:mail-followup-to:openpgp:blahblahblah: author:from:subject:date:to:cc:resent-date:resent-from:resent-to: resent-cc:in-reply-to:references:mime-version:content-type: content-transfer-encoding:message-id:mail-followup-to:openpgp: blahblahblah; bh=KZdfIHzcEzVC0yTqleDORjyfeo1rrtNNv00f8j/hvO0=; b=LgVcCmleuj+eSuHqiCFzUZY/T8CTQdZtVJlfkFP2n5B275hehPwaM4bYtnkYTvFx8DOajcaw nHUGCZfYx0Trce9QQbI8qUr8w+XTVD24JlBxG7Ke37TSNkn98mnUzzqJekWGkQ6/OWyv9F8c8C 7GPZqcdGmcVc6nSJ8XfPr3CBA7HQwBk7fnEzwvsJKPkIsbjbjDlLEYLrDWMtWsJgQt3C12xSA+ 3R3p8mVOvB8f9jqwQX/6kN04fI6Mh/ftzm7ox5abgkeu4cIiPBtNKw3dZLnNnlu00NjQTxQgFI jtNHFXG6pRQi+fKnk/gHO8aVjUsOP9kCS8HumyWIrek2s71w==
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=sdaoden.eu; s=orange; t=1716235932; x=1716902598; h=date:author:from:to:cc:subject: message-id:in-reply-to:references:mail-followup-to:openpgp:blahblahblah: author:from:subject:date:to:cc:resent-date:resent-from:resent-to: resent-cc:in-reply-to:references:mime-version:content-type: content-transfer-encoding:message-id:mail-followup-to:openpgp: blahblahblah; bh=KZdfIHzcEzVC0yTqleDORjyfeo1rrtNNv00f8j/hvO0=; b=+dFVZd15feFHhGFQcUgXt4yRdfoYH3G1uUFQcI6jmOZyGOX5u1k7vMWcHfNEfICQJrf8qYve edFHgeh2OVJvBQ==
Date: Mon, 20 May 2024 22:12:10 +0200
Author: Steffen Nurpmeso <steffen@sdaoden.eu>
From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: worley@ariadne.com
Message-ID: <20240520201210.rDZHCaLk@steffen%sdaoden.eu>
In-Reply-To: <87ikz9e0jq.fsf@hobgoblin.ariadne.com>
References: <87ikz9e0jq.fsf@hobgoblin.ariadne.com>
Mail-Followup-To: (Dale R. Worley) <worley@ariadne.com>, sayrer@gmail.com, tbray@textuality.com, art@ietf.org, draft-ietf-jmap-contacts.all@ietf.org, jmap@ietf.org, last-call@ietf.org, Steffen Nurpmeso <steffen@sdaoden.eu>
User-Agent: s-nail v14.9.24-621-g0d1e55f367
OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt
BlahBlahBlah: Any stupid boy can crush a beetle. But all the professors in the world can make no bugs.
Message-ID-Hash: DD5F552C5LZNO2W74Y7V26VDG6S3T3BH
X-Message-ID-Hash: DD5F552C5LZNO2W74Y7V26VDG6S3T3BH
X-MailFrom: steffen@sdaoden.eu
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-jmap.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: sayrer@gmail.com, tbray@textuality.com, art@ietf.org, draft-ietf-jmap-contacts.all@ietf.org, jmap@ietf.org, last-call@ietf.org, Steffen Nurpmeso <steffen@sdaoden.eu>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Jmap] Re: [art] Artart telechat review of draft-ietf-jmap-contacts-09
List-Id: JSON Message Access Protocol <jmap.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/ShGgkPJJBqhN_DOJEQVTtGMJtNs>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jmap>
List-Help: <mailto:jmap-request@ietf.org?subject=help>
List-Owner: <mailto:jmap-owner@ietf.org>
List-Post: <mailto:jmap@ietf.org>
List-Subscribe: <mailto:jmap-join@ietf.org>
List-Unsubscribe: <mailto:jmap-leave@ietf.org>
Hello. Dale R. Worley wrote in <87ikz9e0jq.fsf@hobgoblin.ariadne.com>: |Steffen Nurpmeso <steffen@sdaoden.eu> writes: | |I don't know what the larger problems might be with |draft-ietf-jmap-contacts-09, but I think there is less trouble with this |particular point than first appears: | |> I myself wonder whether that innocent RFC 9553 sentence |> |> any valid sequence of Unicode characters encoded as a JSON string |> |> excludes surrogates? | |It definitely does, because within the Unicode lexicon, a "surrogate" is |a code point, but not a code point that is assigned to a "character". |Thus surrogates are not "characters" and cannot be members of a "valid |sequence of Unicode characters". I haven't found a really definite |statement of this, but that is clear from both |https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology and |https://www.unicode.org/versions/Unicode15.1.0/ch02.pdf But wait! A surrogate is valid Unicode when "unfolded" to the plain Unicode code point it was before becoming a surrogate. Nothing in the words of this RFC prevents anyone from taking the valid Unicode string and stuffing it surrogatized into the JSON string which allows exactly this surrogate syntax. |Note that Unicode's "character" can be a bit messy. E.g. "lower case a |with umlaut" can be either a single "character" U+00E4 or two |"characters", U+0061 followed by the combining dieresis U+0308. Or for |a particularly hairy ligature in one of the Brahmic scripts, see figure |2-3 in the Unicode document I linked to above, which combines no less |than 6 "characters" into one rendered glyph. Decomposing, normalizing. etc. |> It should, but it then actively changes the |> meaning of "JSON string" to be a dedicated "sub-profile" of what |> "JSON string" normally means, and then to me the sentence is not |> clear enough. | |In principle, you don't need to *define* a profile (sub-specification) of |JSON to say e.g. "the thing must be a JSON string encoding of a sequence |of ASCII letters", though of course in that case the set of "things" |*will be* only a subset of JSON string encodings. | |But in this case, looking at RFC 4627 sec. 2.5, "Strings", it's clear |(though not directly stated) that a JSON string representation will be a |sequence of ASCII characters that represent a sequence of Unicode |characters. So the limitation in this draft to "Unicode characters" |matches what the definition of JSON allows, and as such there is no |subsetting. | |> This seems not to mean entire grapheme clusters. And this seems ... |> does not make sense at all. | |I think that's incorrect because there's no requirement that a Unicode |character passes an "isprint" test. And the Unicode "general category" |attribute for characters/code points has values like "other, control" |and "other, format" that are specified as "characters" but they're not |"printable" in the ordinary sense. See |https://en.wikipedia.org/wiki/Unicode#General_Category_property To be very honest, i will now tell you what has happened. I have not idea. But i tell you what has happened. It is only a fiction, mind you. So back when this RFC has been developed, there suddenly appeared that BIDI (bi-directional text) security advisory all over the software world, in compilers for example, but also text editors -- everywhere! (To recall, via directional Unicode controls a user would see a visual sentence "A", but the software would first work on a sentence "B" that was "bytewise first".) So now the IETF started squealing!, lost its towel!, and then started running -- nude as it was!! -- to the Unicode consortium, that, even though commercial, in practice, different to the IETF, to which money is the rust on its noble sentiment, has the necessary competence, because it has character set experts which designed this over thirty years. (And most of the elder are still in there...) Ie i think of it as either [1] or [2], or .. even both!! (Both! Both!!) [1] https://www.youtube.com/watch?v=jVWDNq558AM [2] https://www.youtube.com/watch?v=5U319VzSqEU This resulted in the following text 1.6.1. Free-Form Text Properties having free-form text values MAY contain any valid sequence of Unicode characters encoded as a JSON string. Such values ==start of BIDI can contain unidirectional left-to-right and right-to-left text, as well as bidirectional text using Unicode Directional Formatting Characters as described in Section 2 of [UBiDi]. Implementations setting bidirectional text MUST make sure that each property value complies with the requirements of the Unicode Bidirectional Algorithm. Implementations MUST NOT assume that text values of ==end of BIDI adjacent properties are processed or displayed as a combined string; for example, the values of a given name component and a surname component may or may not be rendered together. As can be seen, even though there is such a wide area of problematic fields, say, for example, control characters and their misinterpretation (the devilish C0 control characters!), but, well, there are many more, indeed, the JSContact RFC definition covers only a small fraction of it. My very personal view on all this is plain, the IETF should keep its hand off Unicode. This started with the IDNA that i "hate", and, eh, goes on. If you mean "it can be Unicode text", then just refer to Unicode. Find some definition that is "complete", meaning grapheme or word boundary, make that an RFC maybe, and then only point to that. And if you do not want control characters, not even the visual representation that Unicode has for control characters (just add U+2400 for C0 controls), then define some "printable" meaning, and use that. P.S.: as far as i know most of the BIDI-vulnerable software stacks (compilers, text editors, etc) do still not comply to the very complicated (last i looked) Unicode BIDI algorithm, but they only track the directional attribute of code points, and the general directional marks, and count character cells. (This, i would think, is not possible with ISO C alone.) |Dale --End of <87ikz9e0jq.fsf@hobgoblin.ariadne.com> Ciao from Germany, --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
- [Jmap] Artart telechat review of draft-ietf-jmap-… Tim Bray via Datatracker
- [Jmap] Re: [art] Artart telechat review of draft-… Rob Sayre
- [Jmap] Re: [art] Artart telechat review of draft-… Rob Sayre
- [Jmap] Re: [art] Artart telechat review of draft-… Steffen Nurpmeso
- [Jmap] Re: [art] Re: Artart telechat review of dr… worley
- [Jmap] Re: [art] Re: Artart telechat review of dr… Tim Bray
- [Jmap] Re: [art] Re: Artart telechat review of dr… worley
- [Jmap] Re: [art] Re: Artart telechat review of dr… Tim Bray
- [Jmap] Re: [art] Re: Artart telechat review of dr… Rob Sayre
- [Jmap] Re: [art] Artart telechat review of draft-… Steffen Nurpmeso
- [Jmap] Re: Artart telechat review of draft-ietf-j… Neil Jenkins
- [Jmap] Re: Artart telechat review of draft-ietf-j… Tim Bray