Re: [Wish] WG Last Call for draft-ietf-wish-whip

Bernard Aboba <bernard.aboba@gmail.com> Fri, 24 February 2023 21:56 UTC

Return-Path: <bernard.aboba@gmail.com>
X-Original-To: wish@ietfa.amsl.com
Delivered-To: wish@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 661C9C151527 for <wish@ietfa.amsl.com>; Fri, 24 Feb 2023 13:56:27 -0800 (PST)
X-Quarantine-ID: <OE-wmDMV7uuj>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BANNED, message contains image/vnd.microsoft.icon,.dat,favicon.ico
X-Spam-Flag: NO
X-Spam-Score: -1.202
X-Spam-Level:
X-Spam-Status: No, score=-1.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, MIME_HTML_ONLY_MULTI=0.001, MIME_QP_LONG_LINE=0.001, MPART_ALT_DIFF=0.79, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OE-wmDMV7uuj for <wish@ietfa.amsl.com>; Fri, 24 Feb 2023 13:56:23 -0800 (PST)
Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C2418C14CE53 for <wish@ietf.org>; Fri, 24 Feb 2023 13:56:23 -0800 (PST)
Received: by mail-pl1-x632.google.com with SMTP id c1so920850plg.4 for <wish@ietf.org>; Fri, 24 Feb 2023 13:56:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:message-id:subject:date:mime-version:content-transfer-encoding :from:from:to:cc:subject:date:message-id:reply-to; bh=BdK3qoqh1Hv/jergB4AYvqOtDEJsFZn4p/AtFQdRtH0=; b=Q7MDaL+souZdP5cMJl5ZM1WS7UQdTjfKZvUV+ml5Vo5td4lMNTNFP5TeE11iMQwzOY zztbXrP1mMYUWVyao2KVug+hXgZTXhaJkTIihGz7xDEWGXPtTQhQkLkxuFZxv/DkW5GK QAuZOXqzPStxq4dfYRz/zN9f/umOZchh2vzPpgZzevU3gjV9pQShQxA8YcKvvi8ny/4r w1CLx2wP9ka/qHGeVhOGZPtrVyC6+IAQHcCGQlS6Tr2wivYEPumZrv/zc9Fjpr1wgKQk Ko8ev75T2ES5JBnZe2CJ5wKEn0pXo2VQldhyY0lN4zYySwvrtE6Z3cY+Z4p1tg3R1gXr xlfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:message-id:subject:date:mime-version:content-transfer-encoding :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BdK3qoqh1Hv/jergB4AYvqOtDEJsFZn4p/AtFQdRtH0=; b=1Nvv6JYSUQMoa4BsrxSM2ML+bQah9xE3H/pdfMXY+3ee7+mzJZhz+DATlZ6ZjVUxAm jk4ON/egk8orgZ3Q17IQVei5lu+ZGkBCoYTGzqjD0RcVT3KrrNAyeO9USpwWfc+HQ260 QUojQn21q3hF9mczb8TTeNrsVDUGzJgw53jG1Eqh9/ekm2IwqA5H9dgCZ5DFQCPt1FL/ ClBu69K804qw0+MphE9U8rgaGGDQLAqkQNekFMe0COzkqFHbZGykKJPpZpzMVMEUSkwc NUiXZe25caiVn/cQDQjGwm0YlyAKVWigwN6caFqksBcw0xuSdzhioZiIpyc/HzXy+ta+ u/hg==
X-Gm-Message-State: AO0yUKWBMM/IO3drww6F8VRqgKd2CQpDpWFnflwwtWXCgdWTC9cbk4DI 6nAjT4lokoOMRCWLPsoO7yckNI0np+tivw==
X-Google-Smtp-Source: AK7set+lwK6qtsmJP8b3K0EUh8C6Zoqgr3OcqLr5PT1E+rzgU93mU2g5E2Zh8pqQP+rXweaVvZ2OYA==
X-Received: by 2002:a17:902:fa86:b0:19a:66bb:698f with SMTP id lc6-20020a170902fa8600b0019a66bb698fmr17200054plb.58.1677275782077; Fri, 24 Feb 2023 13:56:22 -0800 (PST)
Received: from smtpclient.apple ([50.210.39.233]) by smtp.gmail.com with ESMTPSA id x20-20020a170902ea9400b0019cbabf127dsm3963476plb.182.2023.02.24.13.56.21 for <wish@ietf.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Feb 2023 13:56:21 -0800 (PST)
From: Bernard Aboba <bernard.aboba@gmail.com>
X-Google-Original-From: Bernard Aboba <Bernard.Aboba@gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-4908BF7B-8B6C-4C08-8A9C-48258D80EBF9"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
Date: Fri, 24 Feb 2023 13:56:10 -0800
Message-Id: <8D708B68-2947-40B8-A7C4-5B9697B1C58D@gmail.com>
To: wish@ietf.org
X-Mailer: iPad Mail (20D67)
Archived-At: <https://mailarchive.ietf.org/arch/msg/wish/-ia_-FrZyJh8TRuD64hlvyfNoB0>
Subject: Re: [Wish] WG Last Call for draft-ietf-wish-whip
X-BeenThere: wish@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: WebRTC Ingest Signaling over HTTPS <wish.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wish>, <mailto:wish-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wish/>
List-Post: <mailto:wish@ietf.org>
List-Help: <mailto:wish-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wish>, <mailto:wish-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2023 21:56:27 -0000

Here is my (belated) review. 

Section 1

 While WebRTC has been very successful in a wide range of scenarios,
   its adoption in the broadcasting/streaming industry is lagging 
   behind.

[BA] I recently saw a survey indicating that WHIP is now the second most implemented ingestion protocol, second only to RTMP.  So while this sentence may have been correct at one time, it seems out of date now.  Can we delete this sentence? 

Also, overall Section 1 seems like it could be shortened considerably by highlighting the major points. Here is my suggestion: 

 The IETF RTCWEB working group standardized JSEP ([RFC8829]), a
   mechanism used to control the setup, management, and teardown of a
   multimedia session.  JSEP also describes how to negotiate media flows
   using the Offer/Answer Model with the Session Description Protocol
   (SDP) [RFC3264] as well as the formats for data sent over the wire
   (e.g., media types, codec parameters, and encryption).  WebRTC
   intentionally does not specify a signaling transport protocol at 
   application level. 

Unfortunately, the lack of a standardized signaling mechanism in WebRTC has been an obstacle to adoption as an ingestion protocol within the broadcast/streaming industry, where a streamlined production pipeline is taken for granted: plug in cables carrying raw media to hardware encoders, then push the encoded media to any streaming service or Content Delivery Network (CDN) ingest using an ingestion protocol.
While WebRTC can be integrated with standard signaling protocols like SIP [RFC3261] or XMPP [RFC6120], they are not designed to be used in broadcasting/streaming services and there is no sign of adoption in that industry.  RTSP [RFC7826], which is based on RTP, is not compatible with the SDP offer/answer model [RFC3264].

This document therefore proposes a simple protocol for supporting WebRTC as a media ingestion method which:

   *  Is easy to implement,

   *  Is as easy to use as popular IP-based broadcast protocols

   *  Is fully compliant with WebRTC and RTCWEB specs

   *  Allows for ingest both in traditional media platforms and in
      WebRTC end-to-end platforms with the lowest possible latency.

   *  Lowers the requirements on both hardware encoders and broadcasting
      services to support WebRTC.

   *  Is usable both in web browsers and in native encoders.”

Section 2

I do not see a definition of “track”.  I think this is important to clarify. 

Section 4.2

  While this version of the specification only supports a single audio
   and video track, in order to ensure forward compatibility, if the
   number of audio and or video tracks or number streams is not
   supported by the WHIP Endpoint, it MUST reject the HTTP POST request
   with a "406 Not Acceptable" error response.”

[BA] Support for stereo and surround-sound is becoming increasingly popular.  Can you clarify whether this is supported in this version of the specification?  I assume it can be (e.g. it is possible to have multiple channels per track) but the lack of a definition for “track” creates some uncertainty. 

4.6.  Simulcast and scalable video coding

   Both Simulcast [RFC8853] and Scalable Video Coding (SVC), including
   K-SVC (also known as "S modes", in which multiple encodings are sent
   on the same SSRC), MAY be supported by both the Media Servers and
   WHIP clients through negotiation in the SDP offer/answer.

[BA] K-SVC and “S” modes are different.  K-SVC denotes the KEY and KEY_SHIFT modes, whereas “S” modes denote the encapsulation of multiple encodings within a single SSRC, as is supported in VP9 and AV1.   Diagrams of the various modes are included in the WebRTC-SVC specification: https://w3c.github.io/webrtc-svc/" rel="nofollow">https://w3c.github.io/webrtc-svc/

Also, SVC is *not* negotiated within Offer/Answer in WebRTC.  It is something that the encoder can just turn on.  At least for temporal modes (L1T2, L1T3) ingester support can often be taken for granted.  While the VP9 and AV1 specifications require a compliant decoder to be able to decode any mode than an encoder can encode, in practice there are VP9 and AV1 hardware decoders that cannot decode spatial scalability because they do not support spatial references (e.g. a P-frame at a higher resolution than the P or I frame that it references).

This nasty little “wrinkle” required us to define the “spatialScalability” attribute in Media Capabilities, to allow applications to discover whether a (hardware) decoder supports spatial scalability or not: 

Not sure if you want to get into this in the specification.  Issues are most likely to be encountered where hardware-based transcoders are used (but in that scenario, spatial scalability probably wouldn’t be relevant anyway),