[Ltru] Re: Review of 4646bis-10, sections 1 to 3.4

John Cowan wrote:

> I suggest adding this to 2.2.6 (extensions) after the first sentence:
>| They are intended to identify information which is commonly used
>| in association with language tags but is not part of language
>| identification.

The intro says that language tags are about identifying languages, I
don't see why extensions should be limited to "non-identification".

> In 2.2, the term "code" is defined to refer to a value defined in an
> external standard.  We should, pervasively throughout the document,
> change this to "code element", which is proper ISO terminology.

It used to be "code" already in RFC 1766.  I see no reason to change
this to "proper ISO terminology" when the shorthand is unambiguous.

> For ISO, "code" means the whole list.

For the IETF it doesn't necessarily, so far.    

> 2.2.4(3)(C) uses the phrase "with ambiguous ISO 3166 alpha-2 codes".
> This is not very clear to those who don't know the history, and should
> be expanded to something like "whose alpha-2 code was formerly (since
> <Date B>) associated with a different country".  Someone will have to
> look up what the cutoff date was -- I forget.

2005-10-16 is the Filedate in the last 4645 draft, was that "date B" ?

I think 2.2.4(3)(C) is about codes deprecated after "date A" and later
reused for a *different* country.  Maybe without "different" qualifier,
as Doug read it.

BTW, 2.2.4(3)(E) talks about 4645 instead of 4645bis, and I think it's
not more needed after Debbie registered GG and JE.

> 3.1.1: change the definition of folding to "Folding is always
> done on Unicode default grapheme boundaries".

AFAIK the normal (2822upd) definition of folding is "at places
where WSP is allowed", and unfolding means "replace FWS by SP".

No grapheme boundaries involved.  If 4646bis tries to invent
its own folding rules this should be better explained.  4646bis
shouldn't expose such Unicode oddities, checking my own code:

| #### unfold field body ############################################
| /^[\t ]/ { BODY = BODY " " STRIP( $0 )
|            next
|          }

That replaces FWS by one SP, I'm almost certain that I don't want 
an SP between the "graphemes" of a "folded word".

> also prohibits folding in the middle of a Hangul syllable
> written as separate jamo.

Do they have "WSP separated words", or is there a serious
chance of more than 72-1 adjacent bytes belonging to one
or more graphemes ?  Talking about "bytes" in conjunction
with UTF-8 makes me nervous.

> Hands up all those who can say the official name of the
> U.S.  without looking it up.

The CIA world fact book claims I hit the "conventional long
form" - maybe "official" isn't the same as "conventional".

 Frank

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru