Re: [Slim] Issue 43: How to know the modality of a language indication?
Bernard Aboba <bernard.aboba@gmail.com> Sun, 15 October 2017 23:22 UTC
Return-Path: <bernard.aboba@gmail.com>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DA02133226 for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 16:22:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CiQqGVEhu5TN for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
Received: from mail-ua0-x232.google.com (mail-ua0-x232.google.com [IPv6:2607:f8b0:400c:c08::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5DA9D133229 for <slim@ietf.org>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
Received: by mail-ua0-x232.google.com with SMTP id w45so8545006uac.3 for <slim@ietf.org>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=VDGYRpaW9UyneteoxgoL1QzeTukj4M11Kt/ixlnZURs=; b=qVb1cgHy7lS+RgjxuhNwaI6D9xiXsvSa53MVR6Ti5yzSTEgvQsKjgCMlaFmoD8bBV6 Jb5DQ+6YUqoZLsUpV0NCbzM+kx+9wLXwqPF6obbrRq3bXxFTx4qEx4cY+7AWyjnxW5Bx PnGzqQ3YI8/LiuyvopmwoeuKaxgaFz2QdlgRehjjH5QhzZWPBi24ukHgje/cGjG5xIYi wM7ctk3kU2WMRoVVFop3jRf6r9hwx21ES1+ZvlCXGmtPcYQeozVY0ZObaMkx/TJHfy8f mYJdR/H7JLNrVQwvyRbRmYEMRnSwXKsLBYdWpfLG5omnNoKVxAJFu/cMY1eWNKe7/J24 1rmA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=VDGYRpaW9UyneteoxgoL1QzeTukj4M11Kt/ixlnZURs=; b=Yno1gOsH+7WhSXlkJ0C6WfpbmIdGdO7EboztOjUpeFQx9SG5fpYf9vuOYKetURCBeK jP6s9Q/yE6szd94hTRRk2vn8tROckPACSbs75vshyKLd3lTQcx1IgKmA+64jqqH7AAD3 F55JnuxH0kM67CWucoj/jAzg1Pse1F4gCBphpTimDGtRAhGNiC85DtW/wEr/UOF11C5D FyWWjm/QhA/leVnizpUVmL8NYyQzGifweD27zidaRJn128oiirAPeJx2IHHLK2cj8fHu CkP7KJViQG/Ncua+f+z6Qqi/3rnP6mpARDMzGlPF6qMcZymWPdPVLNAxty/ILtQRyG9b 4W0g==
X-Gm-Message-State: AMCzsaUYqaTtngytpvI34QYbXg+RBiCJONUCkuSm8az2/tMUeMuulPZC pwpZQe7kjDKJ+40+yzC9CMGpcHKuEIzK4IgzYNM=
X-Google-Smtp-Source: ABhQp+S4BJIiB1NspnZdg3KFeNJ9t8IwtdGNKzimMeQsZBjovHqTgwI07jWxX+uD5HQ2CTRv54KphKNjN6wsC3HhsqA=
X-Received: by 10.176.64.131 with SMTP id i3mr1637094uad.195.1508109715148; Sun, 15 Oct 2017 16:21:55 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.32.76 with HTTP; Sun, 15 Oct 2017 16:21:34 -0700 (PDT)
In-Reply-To: <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se>
References: <CAOW+2dtSOgp3JeiSVAttP+t0ZZ-k3oJK++TS71Xn7sCOzMZNVQ@mail.gmail.com> <p06240606d607257c9584@172.20.60.54> <fb9e6b79-7bdd-9933-e72e-a47bc8c93b58@omnitor.se> <CAOW+2dtteOadptCT=yvfmk01z-+USfE4a7JO1+u_fkTp72ygNA@mail.gmail.com> <da5cfaea-75f8-3fe1-7483-d77042bd9708@alum.mit.edu> <b2611e82-2133-0e77-b72b-ef709b1bba3c@omnitor.se> <1b0380ef-b57d-3cc7-c649-5351dc61f878@alum.mit.edu> <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com> <3fc6d055-08a0-2bdb-f6e9-99b94efc49df@alum.mit.edu> <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Sun, 15 Oct 2017 16:21:34 -0700
Message-ID: <CAOW+2dvPSUGA_7tye+KqR1TGs1kYL43TdxBCDOHVEmWOFHud0Q@mail.gmail.com>
To: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Cc: Paul Kyzivat <pkyzivat@alum.mit.edu>, slim@ietf.org
Content-Type: multipart/alternative; boundary="94eb2c12370eda99f7055b9e274e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/nud9Rtu7qO210bFMl0I18jrrntg>
Subject: Re: [Slim] Issue 43: How to know the modality of a language indication?
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Oct 2017 23:22:00 -0000
Paul said: ""- can the UA use this information to change how to render the media?" [BA] If the video is used for signing, an application might infer an encoder preference for frame rate over resolution (e.g. in WebRTC, RTCRtpParameters.degradationPreference = "maintain-framerate" ) See: https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#dom-rtcrtpparameters-degradationpreference On Sun, Oct 15, 2017 at 2:22 PM, Gunnar Hellström < gunnar.hellstrom@omnitor.se> wrote: > Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat: > >> On 10/15/17 1:49 PM, Bernard Aboba wrote: >> >>> Paul said: >>> >>> "For the software to know must mean that it will behave differently for >>> a tag that represents a sign language than for one that represents a spoken >>> or written language. What is it that it will do differently?" >>> >>> [BA] In terms of behavior based on the signed/non-signed distinction, in >>> -17 the only reference appears to be in Section 5.4, stating that certain >>> combinations are not defined in the document (but that definition of those >>> combinations was out of scope): >>> >> >> I'm asking whether this is a distinction without a difference. I'm not >> asking whether this makes a difference in the *protocol*, but whether in >> the end it benefits the participants in the call in any way. >> > <GH>Good point, I was on my way to make a similar comment earlier today. > The difference it makes for applications to "know" what modality a language > tag represents in its used position seems to be only for imagined functions > that are out of scope for the protocol specification. > >> For instance: >> >> - does it help the UA to decide how to alert the callee, so that the >> callee can better decide whether to accept the call or instruct the >> UA about how to handle the call? >> > <GH>Yes, for a regular human user -to-user call, the result of the > negotiation must be presented to the participants, so that they can start > the call with a language and modality that is agreed. > That presentation could be exactly the description from the language tag > registry, and then no "knowledge" is needed from the application. But it is > more likely that the application has its own string for presentation of the > negotiated language and modality. So that will be presented. But it is > still found by a table lookup between language tag and string for a > language name, so no real knowledge is needed. > We have said many times that the way the application tells the user the > result of the negotiation is out of scope for the draft, but it is good to > discuss and know that it can be done. > A similar mechanism is also needed for configuration of the user's > language preference profile further discussed below. > >> >> - does it allow the UA to make a decision whether to accept the media? >> > <GH>No, the media should be accepted regardless of the result of the > language negotiation. > >> >> - can the UA use this information to change how to render the media? >> > <GH>Yes, for the specialized text notation of sign language we have > discussed but currently placed out of scope, a very special rendering > application is needed. The modality would be recognized by a script subtag > to a sign language tag used in text media. However, I think that would be > best to also use it with a specific text subtype, so that the rendering can > be controlled by invocation of a "codec" for that rendering. > >> >> And if there is something like this, will the UA be able to do this >> generically based on whether the media is sign language or not, or will the >> UA need to already understand *specific* sign language tags? >> > <GH>Applications will need to have localized versions of the names for the > different sign languages and also for spoken languages and written > languages, to be used in setting of preferences and announcing the results > of the negotiation. It might be overkill to have such localized names for > all languages in the IANA language registry, so it will need to be able to > handle localized names of a subset och the registry. With good design > however, this is just an automatic translation between a language tag and a > corresponding name, so it does in fact not require any "knowledge" of what > modality is used with each language tag. > The application can ask for the configuration: > "Which languages do you want to offer to send in video" > "Which languages do you want to offer to send in text" > "Which languages do you want to offer to send in audio" > "Which languages do you want to be prepared to receive in video" > "Which languages do you want to be prepared to receive in text" > "Which languages do you want to be prepared to receive in audio" > > And for each question provide a list of language names to select from. > When the selection is made, the corresponding language tag is placed in the > profile for negotiation. > > If the application provides the whole IANA language registry to the user > for each question, then there is a possibility that the user by mistake > selects a language that requires another modality than the question was > about. If the application shall limit the lists provided for each question, > then it will need a kind of knowledge about which language tags suit each > modality (and media) > > > >> E.g., A UA serving a deaf person might automatically introduce a sign >> language interpreter into an incoming audio-only call. If the incoming call >> has both audio and video then the video *might* be for conveying sign >> language, or not. If not then the UA will still want to bring in a sign >> language interpreter. But is knowing the call generically contains sign >> language sufficient to decide against bringing in an interpreter? Or must >> that depend on it being a sign language that the user can use? If the UA is >> configured for all the specific sign languages that the user can deal with >> then there is no need to recognize other sign languages generically. >> > <GH>We are talking about specific language tags here and knowing what > modality they are used for. The user needs to specify which sign languages > they prefer to use. The callee application can be made to look for gaps > between what the caller offers and what the callee can accept, and from > that deduct which type and languages for a conversion that is needed, and > invoke that as a relay service. That invocation can be made completely > table driven and have corresponding translation profiles for available > relay services. But it is more likely that it is done by having some > knowledge about which languages are sign languages and which are spoken > languages and sending the call to the relay service to try to sort out if > they can handle the translation. > >> >> >> So, the answer is - no, the application does not really have any > knowledge about which modality a language tag represents in its used > position. If the user selects to indicate very rare language tag > indications for a media, then a match will just become very unlikely. > > Where does this discussion take us? Should we modify section 5.4 again? > > Thanks > Gunnar > > Thanks, >> Paul >> >> 5.4 >>> <https://tools.ietf.org/html/draft-ietf-slim-negotiating-hum >>> an-language-17#section-5.4>. >>> Undefined Combinations >>> >>> >>> >>> The behavior when specifying a non-signed language tag for a video >>> media stream, or a signed language tag for an audio or text media >>> stream, is not defined in this document. >>> >>> The problem of knowing which language tags are signed and which are >>> not is out of scope of this document. >>> >>> >>> >>> On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat <pkyzivat@alum.mit.edu >>> <mailto:pkyzivat@alum.mit.edu>> wrote: >>> >>> On 10/15/17 2:24 AM, Gunnar Hellström wrote: >>> >>> Paul, >>> Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat: >>> >>> On 10/14/17 2:03 PM, Bernard Aboba wrote: >>> >>> Gunnar said: >>> >>> "Applications not implementing such specific notations >>> may use the following simple deductions. >>> >>> - A language tag in audio media is supposed to indicate >>> spoken modality. >>> >>> [BA] Even a tag with "Sign Language" in the description?? >>> >>> - A language tag in text media is supposed to indicate >>> written modality. >>> >>> [BA] If the tag has "Sign Language" in the description, >>> can this document really say that? >>> >>> - A language tag in video media is supposed to indicate >>> visual sign language modality except for the case when >>> it is supposed to indicate a view of a speaking person >>> mentioned in section 5.2 characterized by the exact same >>> language tag also appearing in an audio media >>> specification. >>> >>> [BA] It seems like an over-reach to say that a spoken >>> language tag in video media should instead be >>> interpreted as a request for Sign Language. If this >>> were done, would it always be clear which Sign Language >>> was intended? And could we really assume that both >>> sides, if negotiating a spoken language tag in video >>> media, were really indicating the desire to sign? It >>> seems like this could easily result interoperability >>> failure. >>> >>> >>> IMO the right way to indicate that two (or more) media >>> streams are conveying alternative representations of the >>> same language content is by grouping them with a new >>> grouping attribute. That can tie together an audio with a >>> video and/or text. A language tag for sign language on the >>> video stream then clarifies to the recipient that it is sign >>> language. The grouping attribute by itself can indicate that >>> these streams are conveying language. >>> >>> <GH>Yes, and that is proposed in >>> draft-hellstrom-slim-modality-grouping with two kinds of >>> grouping: One kind of grouping to tell that two or more >>> languages in different streams are alternatives with the same >>> content and a priority order is assigned to them to guide the >>> selection of which one to use during the call. The other kind of >>> grouping telling that two or more languages in different streams >>> are desired together with the same language content but >>> different modalities ( such as the use for captioned telephony >>> with the same content provided in both speech and text, or sign >>> language interpretation where you see the interpreter, or >>> possibly spoken language interpretation with the languages >>> provided in different audio streams ). I hope that that draft >>> can be progressed. I see it as a needed complement to the pure >>> language indications per media. >>> >>> >>> Oh, sorry. I did read that draft but forgot about it. >>> >>> The discussion in this thread is more about how an application >>> would easily know that e.g. "ase" is a sign language and "en" is >>> a spoken (or written) language, and also a discussion about what >>> kinds of languages are allowed and indicated by default in each >>> media type. It was not at all about falsely using language tags >>> in the wrong media type as Bernard understood my wording. It was >>> rather a limitation to what modalities are used in each media >>> type and how to know the modality with cases that are not >>> evident, e.g. "application" and "message" media types. >>> >>> >>> What do you mean by "know"? Is it for the *UA* software to know, or >>> for the human user of the UA to know? Presumably a human user that >>> cares will understand this if presented with the information in some >>> way. But typically this isn't presented to the user. >>> >>> For the software to know must mean that it will behave differently >>> for a tag that represents a sign language than for one that >>> represents a spoken or written language. What is it that it will do >>> differently? >>> >>> Thanks, >>> Paul >>> >>> >>> Right now we have returned to a very simple rule: we define only >>> use of spoken language in audio media, written language in text >>> media and sign language in video media. >>> We have discussed other use, such as a view of a speaking person >>> in video, text overlay on video, a sign language notation in >>> text media, written language in message media, written language >>> in WebRTC data channels, sign written and spoken in bucket media >>> maybe declared as application media. We do not define these >>> cases. They are just not defined, not forbidden. They may be >>> defined in the future. >>> >>> My proposed wording in section 5.4 got too many >>> misunderstandings so I gave up with it. I think we can live with >>> 5.4 as it is in version -16. >>> >>> Thanks, >>> Gunnar >>> >>> >>> >>> (IIRC I suggested something along these lines a long time >>> ago.) >>> >>> Thanks, >>> Paul >>> >>> _______________________________________________ >>> SLIM mailing list >>> SLIM@ietf.org <mailto:SLIM@ietf.org> >>> https://www.ietf.org/mailman/listinfo/slim >>> <https://www.ietf.org/mailman/listinfo/slim> >>> >>> >>> >>> _______________________________________________ >>> SLIM mailing list >>> SLIM@ietf.org <mailto:SLIM@ietf.org> >>> https://www.ietf.org/mailman/listinfo/slim >>> <https://www.ietf.org/mailman/listinfo/slim> >>> >>> >>> >> > -- > ----------------------------------------- > Gunnar Hellström > Omnitor > gunnar.hellstrom@omnitor.se > +46 708 204 288 > >
- [Slim] Issue 43: How to know the modality of a la… Bernard Aboba
- Re: [Slim] Issue 43: How to know the modality of … Brian Rosen
- Re: [Slim] Issue 43: How to know the modality of … Randall Gellens
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Randall Gellens
- Re: [Slim] Issue 43: How to know the modality of … Bernard Aboba
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Paul Kyzivat
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Paul Kyzivat
- Re: [Slim] Issue 43: How to know the modality of … Bernard Aboba
- Re: [Slim] Issue 43: How to know the modality of … Paul Kyzivat
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Paul Kyzivat
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Bernard Aboba
- Re: [Slim] Issue 43: How to know the modality of … Randall Gellens
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström
- Re: [Slim] Issue 43: How to know the modality of … Bernard Aboba
- Re: [Slim] Issue 43: How to know the modality of … Gunnar Hellström