Re: [clue] I-D Action: draft-ietf-clue-telepresence-requirements-04.txt

On 7/18/13 9:23 PM, Christian Groves wrote:
> Hello Paul,
>
> There was a bounding attribute "the scene area" which was removed. As I
> mentioned at the time the idea was to give an indication of how the
> capture areas were located in the room.

Yeah. But it wasn't sufficient to say much useful for the question I am 
asking below. So it probably made sense to remove it.

More below.

> With regards to audio I think we need to revisit that with respect to
> the spatial attributes.
>
> Regards, Christian
>
> On 19/07/2013 12:42 AM, Paul Kyzivat wrote:
>> Christian,
>>
>> You raise a point that has bothered me for some time:

Note: this point wasn't raised to Christian. It is really addressed to 
all the telepresence experts.

>> We describe the spatial attributes of a capture by point of capture
>> and area of capture (a quadrilateral). Implicitly this describes an
>> unbounded pyramid. In reality, it is bounded by the geometry of the
>> room and the stuff in it, but we don't have that info. This pyramid
>> may be meaningful in some sense for video, though we lack other
>> information such as depth of field. But as John replied, I guess it
>> means very little for audio.

So I think the following is still an important question to answer:

>> The fundamental question is: does this provide sufficient information
>> for the consumer to properly adapt the received media to his available
>> equipment? I don't know the answer to that, but I suspect that it will
>> allow some approximation, but probably not enough to do the best
>> possible job. Is there more that we ought to be providing to help with
>> that job? Or does doing more provide diminishing returns?

I don't have any specific requirement of offer to address it.
I just ask everybody with knowledge of a real telepresence 
implementation: Will you be able to do a satisfactory job of this from 
an arbitrary received advertisement? (Without making further assumptions 
about constraints on the advertiser.)

If the answer is NO, then we should ask what we would need to fix that.
I don't think this calls for any change in requirements, because what I 
am asking is what is called for in requirements #1,2.

	Thanks,
	Paul

>>     Thanks,
>>     Paul
>>
>> On 7/17/13 11:01 PM, Christian Groves wrote:
>>> Hello Mary,
>>>
>>> I did a review of the requirements and Appendix to see if they are met.
>>> My comments are below. Sorry for the length of the email but I figured
>>> it would be easier for people to read my comment [CNG] along with the
>>> requirement.
>>>
>>> REQMT-1:The solution MUST support a description of the spatial
>>>
>>> arrangement of source video images sent in video streams
>>>
>>> which enables a satisfactory reproduction at the receiver
>>>
>>> of the original scene.This applies to each site in a
>>>
>>> point to point or a multipoint meeting and refers to the
>>>
>>> spatial ordering within a site, not to the ordering of
>>>
>>> images between sites.
>>>
>>> [CNG] Supported – via Capture Area attribute.
>>>
>>> Use case point to point symmetric, and all other use cases.
>>>
>>> REQMT-1a:The solution MUST support a means of allowing
>>>
>>> the preservation of the order of images in the
>>>
>>> captured scene.For example, if John is to
>>>
>>> Susan's right in the image capture, John is
>>>
>>> also to Susan's right in the rendered image.
>>>
>>> [CNG] Supported – via Capture Area attribute.
>>>
>>> REQMT-1b:The solution MUST support a means of allowing
>>>
>>> the preservation of order of images in the
>>>
>>> scene in two dimensions - horizontal and
>>>
>>> vertical.
>>>
>>> [CNG] Supported – via Capture Area attribute.
>>>
>>> REQMT-1c:The solution MUST support a means to identify
>>>
>>> the point of capture of individual video
>>>
>>> captures in three dimensions.
>>>
>>> [CNG] Supported – via Point of capture attribute.
>>>
>>> REQMT-1d:The solution MUST support a means to identify
>>>
>>> the extent of individual video captures in
>>>
>>> three dimensions.
>>>
>>> [CNG] Partial – Capture area attribute allows the specification of a
>>> plane of capture in 3 dimensions. However depth of the capture (needed
>>> for a 3D area) is not supported.
>>>
>>> REQMT-2:The solution MUST support a description of the spatial
>>>
>>> arrangement of captured source audio sent in audio streams
>>>
>>> which enables a satisfactory reproduction at the receiver
>>>
>>> in a spatially correct manner.This applies to each site
>>>
>>> in a point to point or a multipoint meeting and refers to
>>>
>>> the spatial ordering within a site, not the ordering of
>>>
>>> channels between sites.
>>>
>>> [CNG] Supported – Via Capture Area attribute.
>>>
>>> Use case point to point symmetric, and all use cases,
>>>
>>> especially heterogeneous.
>>>
>>> REQMT-2a:The solution MUST support a means of preserving
>>>
>>> the spatial order of audio in the captured
>>>
>>> scene.For example, if John sounds as if he is
>>>
>>> at Susan's right in the captured audio, John
>>>
>>> voice is also placed at Susan's right in the
>>>
>>> rendered image.
>>>
>>> [CNG] Supported – Via Capture Area attribute.
>>>
>>> REQMT-2b:The solution MUST support a means to identify
>>>
>>> the number and spatial arrangement of audio
>>>
>>> channels including monaural, stereophonic
>>>
>>> (2.0), and 3.0 (left, center, right) audio
>>>
>>> channels.
>>>
>>> [CNG] Partial – the Audio Channel Format attribute currently only
>>> supports “mono” and “stereo”. It doesn’t support the 3.0 format.
>>>
>>> REQMT-2c:The solution MUST NOT preclude the use of
>>>
>>> binaural audio.[Edt. This is an outstanding
>>>
>>> issue.Text will be changed when the issue is
>>>
>>> resolved.]
>>>
>>> [CNG] Partial? – Binaural isn’t mentioned in the framework so isn’t
>>> precluded as such. There is no indication in CLUE recording a binaural
>>> capture.
>>>
>>> REQMT-2d:The solution MUST support a means to identify
>>>
>>> the point of capture of individual audio
>>>
>>> captures in three dimensions.
>>>
>>> [CNG] Supported – via Point of capture attribute.
>>>
>>> REQMT-2e:The solution MUST support a means to identify
>>>
>>> the extent of individual audio captures in
>>>
>>> three dimensions.
>>>
>>> [CNG] Partial – The Capture area attribute is based on capturing a plane
>>> in Cartesian space. There is no depth parameter. Whether the plane
>>> concept holds for audio like it does video needs further thought.
>>>
>>> REQMT-3:The solution MUST support a mechanism to enable a
>>>
>>> satisfactory spatial matching between audio and video
>>>
>>> streams coming from the same endpoints.
>>>
>>> [CNG] Supported - The given that an Advertiser places captures in a
>>> particular scene it is possible to indicate audio and video streams are
>>> from the same endpoint.
>>>
>>> Use case is point to point symmetric, and all use cases.
>>>
>>> REQMT-3a:The solution MUST enable individual audio
>>>
>>> streams to be associated with one or more video
>>>
>>> image captures, and individual video image
>>>
>>> captures to be associated with one or more
>>>
>>> audio captures, for the purpose of rendering
>>>
>>> proper position.
>>>
>>> [CNG] Supported (Partial?) – It is possible to deduce that audio and
>>> video relate to the same capture area in a scene by analysing the
>>> Capture Area parameters. It is possible to relate individual audio
>>> captures to video captures if the Advertisement is constructed in a
>>> limited way. However its not possible to deduce the relationships
>>> between individual captures by utilising the Scene, CSE and capture
>>> concepts. CSE only relate to one media type. So it’s not possible to
>>> link audio CSEs to video CSEs.
>>>
>>> REQMT-3b:The solution MUST enable individual audio
>>>
>>> streams to be rendered in any desired spatial
>>>
>>> position.
>>>
>>> [CNG] Supported? – I think its assumed that a Consumer can do whatever
>>> it likes with respect to rendering decisions. Ultimately that’s local
>>> policy.
>>>
>>> Edt: Rendering is an open issue. Text will
>>>
>>> be changed when it is resolved.]
>>>
>>> REQMT-4:The solution MUST enable interoperability between
>>>
>>> endpoints that have a different number of similar devices.
>>>
>>> For example, one endpoint may have 1 screen, 1 speaker, 1
>>>
>>> camera, 1 mic, and another endpoint may have 3 screens, 2
>>>
>>> speakers, 3 cameras and 2 mics.Or, in a multi-point
>>>
>>> conference, one endpoint may have one screen, another may
>>>
>>> have 2 screens and a third may have 3 screens.This
>>>
>>> includes endpoints where the number of devices of a given
>>>
>>> type is zero.
>>>
>>> Use case is asymmetric point to point and multipoint.
>>>
>>> [CNG] Supported – CLUE enables different capture capabilities to be
>>> signalled. It makes no assumption as to the rendering.
>>>
>>> REQMT-5:The solution MUST support means of enabling
>>>
>>> interoperability between telepresence endpoints where
>>>
>>> cameras are of different picture aspect ratios.
>>>
>>> [CNG] Supported – CLUE doesn’t describe “aspect ratio” but allows
>>> different capture areas to be defined. This is a means of describing
>>> aspect ratio.
>>>
>>> REQMT-6:The solution MUST provide scaling information which
>>>
>>> enables rendering of a video image at the actual size of
>>>
>>> the captured scene.
>>>
>>> [CNG] Supported - Capture Area, Capture Point, Point of Line of Capture
>>> and Scene Scale allow this.
>>>
>>> REQMT-7:The solution MUST support means of enabling
>>>
>>> interoperability between telepresence endpoints where
>>>
>>> displays are of different resolutions.
>>>
>>> [CNG] Supported? – CLUE allow encoding parameters to be associated with
>>> captures which could state the resolution of the captured image. However
>>> there is no way to indicate “display” resolution. Perhaps the
>>> requirement needs to be reworded?
>>>
>>> REQMT-8:The solution MUST support methods for handling different
>>>
>>> bit rates in the same conference.
>>>
>>> [CNG] Supported – CLUE allow encoding parameters to be associated with
>>> captures which could state the bit rate.
>>>
>>> REQMT-9:The solution MUST support means of enabling
>>>
>>> interoperability between endpoints that send and receive
>>>
>>> different numbers of media streams.
>>>
>>> Use case heterogeneous and multipoint.
>>>
>>> [CNG] Supported – CLUE allows different numbers of captures to be sent.
>>>
>>> REQMT-10:The solution MUST make it possible for endpoints without
>>>
>>> support for telepresence extensions to participate in a
>>>
>>> telepresence session with those that do.
>>>
>>> [CNG] Supported – The support of a CLUE channel will be negotiated
>>> between endpoints. From the current signalling draft it seems a basic
>>> set of media can be established without CLUE. Endpoints that don’t
>>> support CLUE obviously won’t be able to determine spatial information
>>> etc…
>>>
>>> REQMT-11:The solution MUST support a mechanism for determining
>>>
>>> whether or not an endpoint or MCU is capable of
>>>
>>> telepresence extensions.
>>>
>>> [CNG] Supported? – The negotiation of a CLUE channel via SDP is one
>>> method in the signalling draft. Another indication at the SIP level may
>>> be beneficial such as a feature tag. This is yet to be documented.
>>>
>>> REQMT-12:The solution MUST support a means to enable more than two
>>>
>>> sites to participate in a teleconference.
>>>
>>> Use case multipoint.
>>>
>>> [CNG] Partial – This is on the agenda and there’s basic support i.e. via
>>> the composed attribute and “scene-switch-policy” CSE attribute. However
>>> no method is yet defined to keep the source information.
>>>
>>> REQMT-13:The solution MUST support both transcoding and switching
>>>
>>> approaches to providing multipoint conferences.
>>>
>>> [CNG] Supported – CLUE supports these two methods. However further work
>>> is needed on the details.
>>>
>>> REQMT-14:The solution MUST support mechanisms to make possible for
>>>
>>> either or both site switching or segment switching.[Edt:
>>>
>>> This needs rewording.Deferred until layout discussion is
>>>
>>> resolved.]
>>>
>>> [CNG] Partial – The “scene-switch-policy” CSE attribute supports this to
>>> some level but further work is needed in this area.
>>>
>>> REQMT-15:The solution MUST support mechanisms for presentations in
>>>
>>> such a way that:
>>>
>>> *Presentations can have different sources
>>>
>>> *Presentations can be seen by all
>>>
>>> *There can be variation in placement, number and size of
>>>
>>> Presentations
>>>
>>> [CNG] Partial – CLUE allows an endpoint whether the capture is related
>>> to a presentation or not through the use of the presentation attribute.
>>> The spatial attributes (i.e. capture area) allows the place and size to
>>> be indicated. The number can be inferred from the number of captures.
>>> CLUE doesn’t have any policy on who can see the presentations. The CLUE
>>> Advertisement mechanism may include presentation captures to all people
>>> within the conference. Perhaps the 2^nd bullet can be removed or
>>> clarified that its not related to policy?
>>>
>>> REQMT-16:The solution MUST include extensibility mechanisms.
>>>
>>> [CNG] Supported? – whilst not explicitly documented I think there’s
>>> general agreement this is needed.
>>>
>>> REQMT-17:The solution must support a mechanism for allowing
>>>
>>> information about media captures to change during a
>>>
>>> conference.
>>>
>>> [CNG] Supported? – Keeping the CLUE signalling channel open would allow
>>> for this and people seem in agreement this should be possible. Further
>>> detailed work is needed in the signalling draft to describe this. i.e.
>>> incremental vs full updates. Interaction with bearer signalling (i.e.
>>> SDP O/A) etc.
>>>
>>> REQMT-18:The solution MUST provide a mechanism for the secure
>>>
>>> exchange of information about the media captures.
>>>
>>> [CNG] Partial? – It is assumed that CLUE will operate over a DTLS/SCTP
>>> however there’s no documentation regarding CLUE security.
>>>
>>> Appendix A. open issues
>>>
>>> OPEN-1Binaural Audio [REQMT-2C] The need to support of binaural
>>>
>>> audio is unresolved, and the "MUST NOT preclude" language in
>>>
>>> this requirement is problematic.The authors believe this
>>>
>>> requirement needs to be either changed or withdrawn,
>>>
>>> depending on how the issue is resolved.
>>>
>>> [CNG] Not supported - This is still unresolved.
>>>
>>> OPEN-2Reference to Rendering [REQMT-3b] This is the only
>>>
>>> requirement which refers to rendering.It may also be empty,
>>>
>>> since receivers can rendering audio captures as they wish.
>>>
>>> This is deferred until broader discussion on rendering
>>>
>>> requirements is concluded.
>>>
>>> [CNG] I think we should frame the requirements with regards to captures
>>> as that’s the way the framework is written. Perhaps some text to say
>>> that rendering is based on the receivers local policy.
>>>
>>> OPEN-3Conference modes [REQMT-14] This wording of this requirement
>>>
>>> is problematic in part because the conference modes (site
>>>
>>> switching and segment switching) are not defined.It at
>>>
>>> least needs rewording.This is deferred until broader
>>>
>>> discussion on layout is concluded.
>>>
>>> [CNG] Yes this is still open.
>>>
>>> OPEN-4Need to capture requirement that attributes can change at any
>>>
>>> time during the call.
>>>
>>> [CNG] Yes although it seems REQMT-17 covers this.
>>>
>>> OPEN-5Need to add requirement for three dimensions in the right
>>>
>>> place
>>>
>>> [CNG] Requirements 1d and 2e do capture an aspect of 3D.
>>>
>>> OPEN-6Multi-view, is there a requirement needed?
>>>
>>> [CNG] Even if there’s no requirement we appear to support it because we
>>> allow both the capture area and capture point to be sent for captures.
>>> An Advertiser could describe multiple capture points capturing the same
>>> area of capture.
>>>
>>>
>>> Regards, Christian
>>>
>>> On 17/07/2013 6:07 AM, Mary Barnes wrote:
>>>> There really were no changes other than to refresh this draft. I was
>>>> going to extend the security section with more detail, but I really
>>>> can't add much more detail without referencing things that are defined
>>>> in the framework. So, I think the next step is for me to work with
>>>> Mark on the security for the framework.
>>>>
>>>> There are some open issues identified in the appendix of this document
>>>> that we need to figure out whether they are issues that need
>>>> resolution for this document. I will open issues in the tracker and we
>>>> can discuss each one and perhaps make some decisions before Berlin.
>>>>
>>>> We also need to consider whether the current framework meets these
>>>> requirements. If some requirements are not met, we need to decide
>>>> whether it's because they'll be met by the solution documents or won't
>>>> be met at all. In which case, we need to decide whether we actually
>>>> need them for the solution.
>>>>
>>>> Regards,
>>>> Mary.
>>>>
>>>>
>>>> On Tue, Jul 16, 2013 at 11:29 AM, <internet-drafts@ietf.org
>>>> <mailto:internet-drafts@ietf.org>> wrote:
>>>>
>>>>
>>>>     A New Internet-Draft is available from the on-line Internet-Drafts
>>>>     directories.
>>>>     This draft is a work item of the ControLling mUltiple streams for
>>>>     tElepresence Working Group of the IETF.
>>>>
>>>>     Title : Requirements for Telepresence Multi-Streams
>>>>     Author(s) : Allyn Romanow
>>>>     Stephen Botzko
>>>>     Mary Barnes
>>>>     Filename : draft-ietf-clue-telepresence-requirements-04.txt
>>>>     Pages : 14
>>>>     Date : 2013-07-15
>>>>
>>>>     Abstract:
>>>>     This memo discusses the requirements for a specification that
>>>> enables
>>>>     telepresence interoperability, by describing the relationship
>>>> between
>>>>     multiple RTP streams. In addition, the problem statement and
>>>>     definitions are also covered herein.
>>>>
>>>>
>>>>     The IETF datatracker status page for this draft is:
>>>>
>>>> https://datatracker.ietf.org/doc/draft-ietf-clue-telepresence-requirements
>>>>
>>>>
>>>>
>>>>     There's also a htmlized version available at:
>>>>
>>>> http://tools.ietf.org/html/draft-ietf-clue-telepresence-requirements-04
>>>>
>>>>     A diff from the previous version is available at:
>>>>
>>>> http://www.ietf.org/rfcdiff?url2=draft-ietf-clue-telepresence-requirements-04
>>>>
>>>>
>>>>
>>>>
>>>>     Internet-Drafts are also available by anonymous FTP at:
>>>>     ftp://ftp.ietf.org/internet-drafts/
>>>>
>>>>     _______________________________________________
>>>>     I-D-Announce mailing list
>>>>     I-D-Announce@ietf.org <mailto:I-D-Announce@ietf.org>
>>>>     https://www.ietf.org/mailman/listinfo/i-d-announce
>>>>     Internet-Draft
>>>> <https://www.ietf.org/mailman/listinfo/i-d-announce%0AInternet-Draft>
>>>>     directories: http://www.ietf.org/shadow.html
>>>>     or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> clue mailing list
>>>> clue@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/clue
>>>
>>> _______________________________________________
>>> clue mailing list
>>> clue@ietf.org
>>> https://www.ietf.org/mailman/listinfo/clue
>>>
>>
>> _______________________________________________
>> clue mailing list
>> clue@ietf.org
>> https://www.ietf.org/mailman/listinfo/clue
>>
>
> _______________________________________________
> clue mailing list
> clue@ietf.org
> https://www.ietf.org/mailman/listinfo/clue
>