Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

I would agree with Kurt's and Howard's comments.
Perhaps what is really required is a good tool to display and edit the
existing LDIF formatted files that in a view that is deemed acceptable.

-jim
Jim Willeke

On Wed, Jun 3, 2009 at 8:44 PM, Kurt Zeilenga <Kurt.Zeilenga@isode.com>wrote:

>
> On Jun 3, 2009, at 9:10 AM, Yves Dorfsman wrote:
>
>> Is the idea of a here document syntax too ridiculous ?
>>
>
> There are a number of problems with it.  Personally, I think what Steven
> already offered (and likely implemented) is better, though I am concerned
> about line separators.  As Howard comments kind of suggests, when you have a
> value which is multi-lined, it's the syntax that controls what line
> separators are used, not the LDIF.  For instance, in some syntaxes, a $ is
> used to as a line separator.
>
> The problem with your proposal, and Steven's, is that LDIF line separators
> and value line separators are one and the same thing.  While one might be
> case occasionally, it cannot be expected to be generally the case.
>
> LDIF is first and foremost an interchange format.  Conversion from LDAP
> PDU->LDIF Record->LDAP PDU MUST produce as output the input, octet for octet
> for every "data" component (the DN, every attribute description and
> associated values, etc.).
>
>  Is UTF-8 support in LDIF not that important ?
>>
>
> LDIF being a proper interchange format is important.  UTF-8 support (other
> than being able to interchange values whose syntax is UTF-8 encoded) is
> cosmetic.
>
> Adding UTF-8 support does appear to be in support of improving LDIF as a
> proper interchange format.  It seems to be driven by other goals, such as
> trying to make LDIF files displayable.   Given that LDAP does not constrain
> attribute value syntaxes (even directory strings can contain arbitrary
> sequences of Unicode code points), the goal of making LDIF files displayable
> is not terribly feasible.
>
> I note that even today, ASCII LDIF files might not display properly without
> special handling, such as for line separators.  But with UTF-8, line
> separators are only the tip of iceberg of display problems.
>
> I'm not convinced that removing the ASCII restrictions will be a good
> thing.  Not only do I doubt it will have a net positive on displayability of
> LDIF for those who have a displayability goal (I don't this goal), I think
> it will have a net negative impact on interoperability and user confusion,
> such as when the user creates a file using one Unicode normalization
> algorithm, but is trying to set values which require a different Unicode
> normalization value.
>
>  Am I the only one thinking xml is not a good replacement for LDIF,
>>
>
> There already exists a number of XML replacements of LDIF, such as DSML...
>  so I guess at least some do think XML is a good replacement for LDIF.
>
>  if so, should we help Steven with the xmled RFC ?
>>
>
> What Steven and Andrew have done is define an extension for LDIF to allow
> XML values to be represented in a human-readable format instead of requiring
> the use use of base64.  Unfortunately his proposal has interchange issues
> (see the I-D's security considerations section).  This, I think, is a fatal
> problem with this extension.
>
> -- Kurt
>
>
>
>>
>> Thanks.
>>
>>
>> Yves Dorfsman wrote:
>>
>>> Steven Legg wrote:
>>>
>>>> See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt
>>>>>>>>
>>>>>>> I did look at it, personally I find it difficult for humans, for
>>> diff'ing etc... XML has its place, but so does pure text.
>>>
>>>> Yes I was wondering about that, do we need multi-line values as work
>>>>> around because schemas aren't precise enough ?
>>>>>
>>>>
>>>> No, we need them because sheets of paper, computer screens and RFCs are
>>>> not infinitely wide. :-) Human-readability, line breaks and indenting
>>>> tend
>>>> to go hand-in-hand.
>>>>
>>> I've been thinking about this and trying a few things. My conclusion is
>>> that the best solution would be the good old here document.
>>> objectclass: inetOrgPerson
>>> organizationName:<<EOT
>>> The two line
>>>  company
>>> EOT
>>> sn: Jensen
>>> With the following specifications:
>>> Any of the following characters (or sequence in the case of CR+LF) can be
>>> used as a separator (<SEP>):
>>> LF (U+000A), CR (U+000D), CR+LF (U+000D followed by U+000A), NEL
>>> (U+0085), FF (U+000C), LS (U+2028), PS U+2029)
>>> Any sequence of characters can be used instead of EOT, but cannot include
>>> a separator character. The same sequence has to be used at the begining and
>>> the end.
>>> Any UTF-8 character, except separators, can be used on each line.
>>> Any separator can be used to separate the lines.
>>> The text start after EOT<SEP>, and finishes with the last character
>>> before <SEP>EOT. The organization name in the example above is exactly two
>>> lines, the last separator is not part of the text.
>>> No need or possibility to escape characters, no possibility of folding
>>> lines  .
>>>
>>
>>
>> --
>> Yves.
>> http://www.sollers.ca/
>>
>> _______________________________________________
>> Ldapext mailing list
>> Ldapext@ietf.org
>> https://www.ietf.org/mailman/listinfo/ldapext
>>
>
> _______________________________________________
> Ldapext mailing list
> Ldapext@ietf.org
> https://www.ietf.org/mailman/listinfo/ldapext
>