Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

Jim Willeke <jim@willeke.com> Thu, 04 June 2009 00:54 UTC

Return-Path: <jim@willeke.com>
X-Original-To: ldapext@core3.amsl.com
Delivered-To: ldapext@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 707333A685C for <ldapext@core3.amsl.com>; Wed, 3 Jun 2009 17:54:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.668
X-Spam-Level:
X-Spam-Status: No, score=0.668 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_22=0.6, J_CHICKENPOX_43=0.6, MISSING_HEADERS=1.292, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LXD8bAqu3exk for <ldapext@core3.amsl.com>; Wed, 3 Jun 2009 17:54:22 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.31]) by core3.amsl.com (Postfix) with ESMTP id E4C143A67D2 for <ldapext@ietf.org>; Wed, 3 Jun 2009 17:54:21 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so342329ywj.49 for <ldapext@ietf.org>; Wed, 03 Jun 2009 17:54:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.151.14.5 with SMTP id r5mr2292758ybi.300.1244076859272; Wed, 03 Jun 2009 17:54:19 -0700 (PDT)
In-Reply-To: <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com>
References: <49C497F9.7010200@zioup.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com>
Date: Wed, 03 Jun 2009 20:54:19 -0400
Message-ID: <b662a94e0906031754n217f96c8t55e1e0c34f11bb86@mail.gmail.com>
From: Jim Willeke <jim@willeke.com>
Cc: ldapext@ietf.org
Content-Type: multipart/alternative; boundary="000e0cd6adaa453076046b7b3777"
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
X-BeenThere: ldapext@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: LDAP Extension Working Group <ldapext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ldapext>, <mailto:ldapext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ldapext>
List-Post: <mailto:ldapext@ietf.org>
List-Help: <mailto:ldapext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ldapext>, <mailto:ldapext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Jun 2009 00:54:23 -0000

I would agree with Kurt's and Howard's comments.
Perhaps what is really required is a good tool to display and edit the
existing LDIF formatted files that in a view that is deemed acceptable.

-jim
Jim Willeke


On Wed, Jun 3, 2009 at 8:44 PM, Kurt Zeilenga <Kurt.Zeilenga@isode.com>wrote:

>
> On Jun 3, 2009, at 9:10 AM, Yves Dorfsman wrote:
>
>> Is the idea of a here document syntax too ridiculous ?
>>
>
> There are a number of problems with it.  Personally, I think what Steven
> already offered (and likely implemented) is better, though I am concerned
> about line separators.  As Howard comments kind of suggests, when you have a
> value which is multi-lined, it's the syntax that controls what line
> separators are used, not the LDIF.  For instance, in some syntaxes, a $ is
> used to as a line separator.
>
> The problem with your proposal, and Steven's, is that LDIF line separators
> and value line separators are one and the same thing.  While one might be
> case occasionally, it cannot be expected to be generally the case.
>
> LDIF is first and foremost an interchange format.  Conversion from LDAP
> PDU->LDIF Record->LDAP PDU MUST produce as output the input, octet for octet
> for every "data" component (the DN, every attribute description and
> associated values, etc.).
>
>  Is UTF-8 support in LDIF not that important ?
>>
>
> LDIF being a proper interchange format is important.  UTF-8 support (other
> than being able to interchange values whose syntax is UTF-8 encoded) is
> cosmetic.
>
> Adding UTF-8 support does appear to be in support of improving LDIF as a
> proper interchange format.  It seems to be driven by other goals, such as
> trying to make LDIF files displayable.   Given that LDAP does not constrain
> attribute value syntaxes (even directory strings can contain arbitrary
> sequences of Unicode code points), the goal of making LDIF files displayable
> is not terribly feasible.
>
> I note that even today, ASCII LDIF files might not display properly without
> special handling, such as for line separators.  But with UTF-8, line
> separators are only the tip of iceberg of display problems.
>
> I'm not convinced that removing the ASCII restrictions will be a good
> thing.  Not only do I doubt it will have a net positive on displayability of
> LDIF for those who have a displayability goal (I don't this goal), I think
> it will have a net negative impact on interoperability and user confusion,
> such as when the user creates a file using one Unicode normalization
> algorithm, but is trying to set values which require a different Unicode
> normalization value.
>
>  Am I the only one thinking xml is not a good replacement for LDIF,
>>
>
> There already exists a number of XML replacements of LDIF, such as DSML...
>  so I guess at least some do think XML is a good replacement for LDIF.
>
>  if so, should we help Steven with the xmled RFC ?
>>
>
> What Steven and Andrew have done is define an extension for LDIF to allow
> XML values to be represented in a human-readable format instead of requiring
> the use use of base64.  Unfortunately his proposal has interchange issues
> (see the I-D's security considerations section).  This, I think, is a fatal
> problem with this extension.
>
> -- Kurt
>
>
>
>>
>> Thanks.
>>
>>
>> Yves Dorfsman wrote:
>>
>>> Steven Legg wrote:
>>>
>>>> See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt
>>>>>>>>
>>>>>>> I did look at it, personally I find it difficult for humans, for
>>> diff'ing etc... XML has its place, but so does pure text.
>>>
>>>> Yes I was wondering about that, do we need multi-line values as work
>>>>> around because schemas aren't precise enough ?
>>>>>
>>>>
>>>> No, we need them because sheets of paper, computer screens and RFCs are
>>>> not infinitely wide. :-) Human-readability, line breaks and indenting
>>>> tend
>>>> to go hand-in-hand.
>>>>
>>> I've been thinking about this and trying a few things. My conclusion is
>>> that the best solution would be the good old here document.
>>> objectclass: inetOrgPerson
>>> organizationName:<<EOT
>>> The two line
>>>  company
>>> EOT
>>> sn: Jensen
>>> With the following specifications:
>>> Any of the following characters (or sequence in the case of CR+LF) can be
>>> used as a separator (<SEP>):
>>> LF (U+000A), CR (U+000D), CR+LF (U+000D followed by U+000A), NEL
>>> (U+0085), FF (U+000C), LS (U+2028), PS U+2029)
>>> Any sequence of characters can be used instead of EOT, but cannot include
>>> a separator character. The same sequence has to be used at the begining and
>>> the end.
>>> Any UTF-8 character, except separators, can be used on each line.
>>> Any separator can be used to separate the lines.
>>> The text start after EOT<SEP>, and finishes with the last character
>>> before <SEP>EOT. The organization name in the example above is exactly two
>>> lines, the last separator is not part of the text.
>>> No need or possibility to escape characters, no possibility of folding
>>> lines  .
>>>
>>
>>
>> --
>> Yves.
>> http://www.sollers.ca/
>>
>> _______________________________________________
>> Ldapext mailing list
>> Ldapext@ietf.org
>> https://www.ietf.org/mailman/listinfo/ldapext
>>
>
> _______________________________________________
> Ldapext mailing list
> Ldapext@ietf.org
> https://www.ietf.org/mailman/listinfo/ldapext
>