Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)

Bernard Aboba <bernard.aboba@gmail.com> Wed, 15 February 2017 23:53 UTC

Return-Path: <bernard.aboba@gmail.com>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E99E129BF0; Wed, 15 Feb 2017 15:53:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xLOyFDa6h5aL; Wed, 15 Feb 2017 15:53:04 -0800 (PST)
Received: from mail-ua0-x22e.google.com (mail-ua0-x22e.google.com [IPv6:2607:f8b0:400c:c08::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2611312995C; Wed, 15 Feb 2017 15:53:04 -0800 (PST)
Received: by mail-ua0-x22e.google.com with SMTP id 96so1162200uaq.3; Wed, 15 Feb 2017 15:53:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=aJir4W3O2MZUUnTTSLP66bMdSiGfXn9b7A2Da+LBahg=; b=GCcO4vZZk09fIx6xo0ky+N7euwklMgRlrQ3P2TChCfRmg9oiFooe6OPGH6w6Ddd8+o M3AORq56476ua/w8wwX/FL16a5CBrFOw0/ezAkxoPRjCRLf+qteCg7hVG5xkrCou38GC RruO1iTwVF3qc7i6ZW8SOamNnD8OVVPqYtHWBMFbMOPYZhF0OjMVcIQV895zNzQBBTqn BuSzBfWFPSxx5sQdNZdjb58DxeXakggds5Z41mPFWEzHD1+exgRYJ1HWczR9gTML4Dh5 D3z8NDy3DKHJV/MWhODfXY0/h21iCHFhWL5SORoTHYEnUUv80oiyXxlw4tQ9VphydDyH szbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=aJir4W3O2MZUUnTTSLP66bMdSiGfXn9b7A2Da+LBahg=; b=lTUpc65DgAaeLA3pBaCAcYRWDvguSiuWp3QFbOiBFs2Yv9A/XZd/ZZBVESYGBFhCk2 sQJiah2XrIau3sCO03TwL701vv3Ghdged7f0z223jlmKxQ3jLUjvMkkvsyu3M0H5x5yX RKHZqu/PVeT/wvNC4eIUKHzwmxTr5eq7W1w+ax6u8W2pihJdRsw/Ec9yl1MyTu2ocLaY OW8EBARJZi9XrxO1FFw7K9v65JlOSrLCTTFINoYOmZDMuhm3QFh1kWSAAttG3hl4EPlb Ro0/Iy/9cpg7ioQF3ZazNgHdQ5gJdN3k4iT1GEiG3q7IXGntSC/AaypiDk0lxe89E6ST H5aw==
X-Gm-Message-State: AMke39lgFxqrzbEaaAVJB7x/hBgpxa8hT8ux5TU0De+8UPOR5lMhiNuwlCaqQQInkHiaiBHIkLZX4/Ckqf8r3g==
X-Received: by 10.176.2.67 with SMTP id 61mr2250549uas.108.1487202783104; Wed, 15 Feb 2017 15:53:03 -0800 (PST)
MIME-Version: 1.0
Received: by 10.176.88.90 with HTTP; Wed, 15 Feb 2017 15:52:42 -0800 (PST)
In-Reply-To: <4f1f3a72-d8a9-4f41-4133-0e6d54aadec8@omnitor.se>
References: <ddc5af1d-f084-f57e-d6c9-5963e4fe98d3@omnitor.se> <4c4ef65a-a907-cf5e-4b2c-835fb55d0146@omnitor.se> <p06240603d4c8f105055e@99.111.97.136> <434a4f06-f034-46ca-9df7-f59059e67e41@alumni.stanford.edu> <843f0cc1-2686-162d-25dc-0075847579bc@omnitor.se> <p06240609d4c937dc9ff8@99.111.97.136> <84760193-19e6-1f53-43cc-32b0493a1844@alumni.stanford.edu> <p0624060dd4c9523fcf2a@99.111.97.136> <4f1f3a72-d8a9-4f41-4133-0e6d54aadec8@omnitor.se>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Wed, 15 Feb 2017 15:52:42 -0800
Message-ID: <CAOW+2dsQuWnF8r_1LMsKFf9WLa=r5vN=oZfQHZdLz2c9E8xkgQ@mail.gmail.com>
To: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Content-Type: multipart/alternative; boundary="001a113dcd7a989c4b05489a6191"
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/PiiSUMZZ4mu7IVQ72yyszhJRbYs>
Cc: "slim@ietf.org" <slim@ietf.org>, ietf@ietf.org
Subject: Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Feb 2017 23:53:07 -0000

Gunnar Hellstrom said:

"The SDP Lang attribute in RFC 4566, where you (Randall) say it is intended
for specifying a set of languages that all must be used in a session, while
I say that it is intended for negotiation of at least one initial language."

[BA] At IETF 96 in Berlin, we had a discussion of the history of the SDP
Lang attribute within the MMUSIC WG.

The Lang attribute was originally specified in RFC 2327, which was
published in April 1998, more than four years prior to the publication of
Offer/Answer RFC 3264 (June 2002), and three years prior to publication of
the initial draft-rosenberg-mmusic-sdp-offer-answer-00 (April 26, 2001).

As a result, the Lang attribute could not have been designed for use in
Offer/Answer negotiation, but instead was intended for use in the
declarative SDP of multicast conferencing.  Note that the Lang attribute
was not mentioned in RFC 3264, and noone at the MMUSIC WG session was aware
of a subsequent SIP Offer/Answer implementation of it.









On Wed, Feb 15, 2017 at 1:41 AM, Gunnar Hellström <
gunnar.hellstrom@omnitor.se> wrote:

> Den 2017-02-15 kl. 01:39, skrev Randall Gellens:
>
>> At 4:21 PM -0800 2/14/17, Randy Presuhn wrote:
>>
>>  Hi -
>>>
>>>  On 2/14/2017 2:43 PM, Randall Gellens wrote:
>>>
>>>>  At 8:59 PM +0100 2/14/17, Gunnar Hellström wrote:
>>>>
>>>>   Den 2017-02-14 kl. 19:05, skrev Randy Presuhn:
>>>>>
>>>>>   Hi -
>>>>>>
>>>>>>   On 2/14/2017 9:40 AM, Randall Gellens wrote:
>>>>>>
>>>>>>>   At 11:01 AM +0100 2/14/17, Gunnar Hellström wrote:
>>>>>>>
>>>>>>>    My proposal for a reworded section 5.4 is:
>>>>>>>>
>>>>>>>>    5.4.  Unusual language indications
>>>>>>>>
>>>>>>>>    It is possible to specify an unusual indication where the
>>>>>>>> language
>>>>>>>>    specified may look unexpected for the media type.
>>>>>>>>
>>>>>>>>    For such cases the following guidance SHALL be applied for the
>>>>>>>>   humintlang attributes used in these situations.
>>>>>>>>
>>>>>>>>    1.    A view of a speaking person in the video stream SHALL,
>>>>>>>> when it
>>>>>>>>   has relevance for speech perception, be indicated by a
>>>>>>>> Language-Tag
>>>>>>>>   for spoken/written language with the "Zxxx" script subtag to
>>>>>>>> indicate
>>>>>>>>   that the contents is not written.
>>>>>>>>
>>>>>>>>    2.    Text captions included in the video stream SHALL be
>>>>>>>> indicated
>>>>>>>>   by a Language-Tag for spoken/written language.
>>>>>>>>
>>>>>>>>    3.    Any approximate representation of sign language or
>>>>>>>>   fingerspelling in the text media stream SHALL be indicated by a
>>>>>>>>   Language-Tag for a sign language in text media.
>>>>>>>>
>>>>>>>>    4.    When sign language related audio from a person using sign
>>>>>>>>   language is of importance for language communication, this SHALL
>>>>>>>> be
>>>>>>>>   indicated by a Language-Tag for a sign language in audio media.
>>>>>>>>
>>>>>>>
>>>>>>>   [RG] As I said, I think we should avoid specifying this until we
>>>>>>> have
>>>>>>>   deployment experience.
>>>>>>>
>>>>>>   ...
>>>>>>
>>>>>>   From a process perspective, it's far easier to remove constraints
>>>>>>   as a specification advances than it is to add them.
>>>>>>
>>>>>   I agree. It is often better to specify normatively as far as you can
>>>>>  imagine, so that interoperability and good functionality is achieved.
>>>>>  Stopping halfway and have MAY in the specifications creates
>>>>>  uncertainty and less useful specifications.
>>>>>
>>>>
>>>>  My reading of what Randy says is the opposite of Gunnar's. In my
>>>>  reading, Randy points out that is it easier to remove the SHOULD NOT in
>>>>  the future then it is to change the meaning of the combinations or
>>>>  switch to a different mechanism.
>>>>
>>>>  In my experience, it's better to specify only what we know we need and
>>>>  what we know we understand.  Speculative specifications "as far as you
>>>>  can imagine" more often lead to interoperability problems, unnecessary
>>>>  complexity, limitations on what's needed in the future, and divergent
>>>>  implementations.
>>>>
>>>
>>>  I think the difference in your positions comes down to
>>>
>>>    (1) your respective notions of "what we know we need and what we
>>>        know we understand";
>>>
>>>    (2) whether you believe that the interoperability and conformance
>>>        consequences of removing a "SHOULD NOT" could be the same
>>>        as those merely retaining a "MUST" or "SHALL" - this determines
>>>        whether Randy G.'s proposal provides a path for some future
>>>        revision to mandate (if deployment experience substantiates the
>>>        need/understanding) the behavior proposed by Gunnar. That path
>>>        is not at all obvious to me.
>>>
>>
>> The purpose of the draft is to enable the two endpoints of a real-time
>> communication session to agree which languages and media to use for
>> interactive communication.  We have a mechanism of adding language tags to
>> media stream negotiations.  In most cases, the language and media modality
>> are an obvious fit.  There are combinations of media and language where the
>> meaning is not so obvious, specifically, signed language tags with a audio
>> or text, and non-signed language tags with video.  My proposal is that we
>> say offerer SHOULD NOT send such combinations and answerer MAY ignore
>> language. This allows future specifications for the underlying uses Gunnar
>> wants (such as real-time subtitles in video and signed equivalents in
>> text).  Such future specifications could define a use for the language and
>> media combinations and remove the SHOULD NOT send and MAY ignore, or could
>> define a new mechanism.  I don't think we know enough now to dictate what
>> the solution should be.
>>
> We have a fresh example from our own discussions in the SLIM group how
> unfortunate it is to not be sufficiently explicit in the first edition of a
> standard. The SDP Lang attribute in RFC 4566, where you (Randall) say it is
> intended for specifying a set of languages that all must be used in a
> session, while I say that it is intended for negotiation of at least one
> initial language. By having that uncertainty in a specification that has
> been published makes it very hard to sharpen up the specification
> afterwards because it would possibly make some implementations non
> conformant. And it makes potential implementors hesitant to use the current
> specifications, as it was with the SLIM work.
>
> For 5.4.
>
> I am OK with modifying from my latest proposal, but we need to be specific.
> I am also OK with reducing the SHALLs to SHOULDs as Addison requested.
>
> The situation is not that we lack knowledge. Here is what we know about
> the 4 cases of "unusual" indications:
>
> 1. View of the speaker in video. Very important for speech perception.
> Quality requirements are documented in ITU-T H-series Supplement 1. Of real
> use only as a complement to the same spoken language in audio. Now, when we
> know about the Zxxx notation for non-written, we also have a good way of
> specifying it precisely.
> This case was also described in section 5.2 already.
>
> 2. Text captions in the video stream.
> This can be either text merged into video and communicated as true part of
> the video image, or it can be a text component of a multimedia system, as
> MPEG-4, declared in SDP as m=video.
> It has been used in some videophone products, but I have not seen it used
> lately.
> It is a clearly defined case, and we can specify coding for it, but we do
> not at the moment know if it will be important to specify it.
>
> 3. Sign language or fingerspelling in the text stream.
> I have seen a product using it for claimed sign language conversation. It
> is also in use in the simple text form with words in capitals approximately
> representing signs between persons involved in preparation of sign language
> productions and translations. But in that case it is in a session where
> they agree in other ways to start using the text stream for that purpose.
> So I think we can say that this is rare, and its use can be agreed by other
> means between the users. Still it is a clearly defined case.
>
> 4. Audio from signing person related to sign language. This is more vague
> than the others.  It may be a person signing in video and adding spoken
> words in audio to signing, but influenced by the word order and grammar of
> sign language with some ambition to make it reasonably understandable for
> both deaf and hearing participants. There are even some spoken words
> created from sign language that are commonly used by hearing persons in
> such situations. But for that case I anyway think it is better to define
> the audio part as the spoken language it is derived from, because of its
> intention to be understandable for hearing persons. All other variants I
> can imagine are even closer to the spoken language and should be specified
> with spoken language tag. If we only want to have the audio stream
> established to hear the background in the signing situation, then we should
> not specify language use of the audio stream.
> Even if we know what sign language tag in audio stream would be, it may be
> just as good to leave it undefined.
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------
> So, new proposal:
>
> 5.4.  Unusual language indications
>
>    It is possible to specify an unusual indication where the language
>    specified may look unexpected for the media type.
>
>    For such cases the following guidance SHOULD be applied for the
>   humintlang attributes used in these situations.
>
>    1.    A view of a speaking person in the video stream SHOULD, when it
>   has relevance for speech perception, be indicated by a humintlang
> attribute with a Language-Tag
>   for a spoken/written language with the "Zxxx" script subtag to indicate
>   that the contents is not written.
>
>    2.    Text captions included in the video stream SHOULD be indicated
>   by a humintlang attribute with Language-Tag for spoken/written language.
>
>    3.    A Language-Tag for a sign language specified in a humintlang
> attribute for a text stream MAY be interpreted as use of an approximate
> representation of sign language or fingerspelling in the text media stream.
> The use of such representation is rare and usually conveniently agreed by
> other means between the users during an established session. Common support
> of this indication SHOULD NOT be assumed or required.
>
>    4.    A Language-Tag for a sign language specified in a humintlang
> attribute for an audio stream SHOULD NOT be indicated and MAY be ignored on
> reception. Any use of spoken words or spoken language in the audio stream
> SHOULD, when it can be of importance for language communication, be
> indicated by the corresponding Language-Tag for spoken language in a
> humintlang attribute for the audio stream.
>
>
>
>
> Gunnar
>
>
> --
> -----------------------------------------
> Gunnar Hellström
> Omnitor
> gunnar.hellstrom@omnitor.se
> +46 708 204 288
>
> _______________________________________________
> SLIM mailing list
> SLIM@ietf.org
> https://www.ietf.org/mailman/listinfo/slim
>