[hybi] Questions about RFC6455 (implementing websockets)

"A. Rothman" <amichai2@amichais.net> Wed, 04 September 2024 17:30 UTC

Return-Path: <amichai2@amichais.net>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34F20C180B4F for <hybi@ietfa.amsl.com>; Wed, 4 Sep 2024 10:30:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.907
X-Spam-Level:
X-Spam-Status: No, score=-6.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IzURvZMPnU58 for <hybi@ietfa.amsl.com>; Wed, 4 Sep 2024 10:30:55 -0700 (PDT)
Received: from freeutils.net (amichais.net [85.130.191.195]) by ietfa.amsl.com (Postfix) with ESMTP id 9954EC1840FB for <hybi@ietf.org>; Wed, 4 Sep 2024 10:30:54 -0700 (PDT)
Received: from router.asus.com (EHLO [192.168.80.80]) ([192.168.80.1]) by shefa (JAMES SMTP Server ) with ESMTPA ID d9039295 for <hybi@ietf.org>; Wed, 04 Sep 2024 20:30:53 +0300 (IDT)
Content-Type: multipart/alternative; boundary="------------qmWt0z0PgtgNHkXfvKWQ9maJ"
Message-ID: <9c213fb1-8f31-4c9a-a8a8-562693a3f7b7@amichais.net>
Date: Wed, 04 Sep 2024 20:30:53 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: "A. Rothman" <amichai2@amichais.net>
To: hybi@ietf.org
Content-Language: en-US
Message-ID-Hash: NBHLWAMQNBOINBGZMMTW2YKTMIMX4PGA
X-Message-ID-Hash: NBHLWAMQNBOINBGZMMTW2YKTMIMX4PGA
X-MailFrom: amichai2@amichais.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-hybi.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [hybi] Questions about RFC6455 (implementing websockets)
List-Id: Server-Initiated HTTP <hybi.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/hybi/ul72pUH_zBaEumN9uJN0hihPsYY>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hybi>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Owner: <mailto:hybi-owner@ietf.org>
List-Post: <mailto:hybi@ietf.org>
List-Subscribe: <mailto:hybi-join@ietf.org>
List-Unsubscribe: <mailto:hybi-leave@ietf.org>

Hi,

I'm currently working on adding websocket support to the JLHTTP server. 
I've already read RFC6455 in full before starting to implement it, and 
once again at the end to make sure nothing was missed (plus lots of 
re-reading specific sections in the middle), but still have some open 
questions. I was forwarded here from the ietf@ietf.org mailing list, I 
hope this is the right place for this discussion but if not then 
apologies and please direct me to the correct venue :-)

Along the way I also took note of things that were unclear or confusing 
(whether due to the wording of the spec or my misunderstanding of it). I 
wanted to share the issues I came across, both to hopefully get some 
answers in order to write a correct implementation, but also since some 
issues may need to be reported as errata or for future updates or 
clarifications to the spec (any input on if and what is appropriate 
would be welcome). Here are the issues, in order of appearance:


1: 1.8

    "At the time of writing of this
    specification, it should be noted that connections on ports 80 and
    443 have significantly different success rates, with connections on
    port 443 being significantly more likely to succeed, though this may
    change with time."

    I could neither understand what this is trying to say (Is this a 
general statistic about port-scanning? Or a fun fact specific to 
websocket deployments?) nor why "it should be noted" - how is it 
relevant to the websocket spec? Would removing this sentence entirely 
make any difference in understanding, implementing or using websockets? 
I'm not sure if I'm missing an important point or this is really redundant.

2: 4.2.1.3

    "An |Upgrade| header field containing the value "websocket""

    Does "containing" imply it can have other elements as well (as the 
Upgrade field is normally defined), or does it mean the entire value 
must exactly equal "websocket"? The former makes more sense, being 
standard, but I've seen interpretations both ways, and mention of 
clients that reject anything but the single exact value. Perhaps this 
can be clarified in the spec.

3: 4.2.2.4 (regarding /origin/)

    "For more detail, refer to Section 10"

    Shouldn't this be 10.2 ("Origin Considerations")?

4: 4.2.2.4 (regarding /key/)

   "It is not necessary for the server to base64-decode the 
|Sec-WebSocket-Key| value."

   Although technically correct, this is somewhere between a half-truth 
and misleading: section 4.2.1 states that

   "If the server, while reading the handshake, finds that the client did
    not send a handshake that matches the description below [...]
    the server MUST stop processing the client's handshake and return an 
HTTP
    response with an appropriate error code"

    and the "description below" of the key field is:

    "A |Sec-WebSocket-Key| header field with a base64-encoded (see
    Section 4 of [RFC4648]) value that, when decoded, is 16 bytes in
    length."

    So, although technically the base64 value does not need to be 
decoded, the server still MUST ensure it's a valid base64 string (to 
match the 'base64-encoded' description), and does have to check the 
padding bytes along with the total encoded string length to make sure 
the decoded base64 value is exactly 16 bytes in length (to match the "16 
bytes in length" description). By this point you're only a few bit 
shifts away from fully decoding the value :-) In all but the most highly 
optimized implementations, these checks would most likely be implemented 
by just base-64 decoding the value, rather than implementing a separate 
'base64 character validity, padding and length checker'. Or they would 
ignore/miss this requirement, due to the misleading "it is not necessary 
to decode" statement (as I did at first), and not perform this 
validation at all, which would violate the MUST in 4.2.1.

    I think either the misleading statement should be removed, or the 
description of the header read by the server should be loosened to just 
checking that the header exists and not describe its contents (thus 
allowing the server to indeed ignore it being base64 at all), or leave 
both as-is, but add an explicit exception to the rule in this header's 
description stating the value itself, as described, does not need to be 
validated at all.

5: 4.2.2.5.4

   The Sec-WebSocket-Accept ABNF allows base64 of any length - why not 
limit it to the exact length actually required by this header (encoded 
SHA1 result length?)? Or at least add an ABNF comment about the required 
length (as is done in various other ABNF definitions in the spec)?

6: 5.1

   "a client MUST mask all frames that it sends to the server" and
   "A server MUST NOT mask any frames that it sends to the client"

   However in 5.2, referring to the Mask field definition, it only says:

   "All frames sent from client to server have this bit set to 1."

   I think it should also say "All frames sent from server to client 
have this bit set to 0". It's strange to mention the requirement in one 
direction and not the other here, even though both are MUST (NOT) 
earlier. It would make sense to mention either both directions or none, 
not just one.

7.1.5

   "_The WebSocket Connection Close Code_ is defined as the status code 
(Section 7.4 
<https://datatracker.ietf.org/doc/html/rfc6455#section-7.4>) contained in
   the first Close control frame received by the application 
implementing this protocol."

   "the first Close control frame received" implies there can be more 
than one (also in 7.1.6)? What scenario is this? Why not explicitly 
disallow sending more than one? How is the receiver supposed to react to 
the second one?

7: 5.2

   frame-opcode            = frame-opcode-non-control /
                             frame-opcode-control /
                             frame-opcode-cont
   frame-opcode-cont       = %x0 ; frame continuation
   frame-opcode-non-control= %x1 ; text frame
                           / %x2 ; binary frame
                           / %x3-7
                           ; 4 bits in length,
                           ; reserved for further non-control frames
   frame-opcode-control    = %x8 ; connection close
                           / %x9 ; ping
                           / %xA ; pong
                           / %xB-F ; reserved for further control
                                   ; frames
                                   ; 4 bits in length

   These should all be 4 bits in length, but oddly it seems to mention 
this only for the reserved bit values - first of all, the indentation 
and order of comments is different between frame-opcode-non-control and 
frame-opcode-control, which is unnecessarily confusing, but also the 
length comment is between the reserved values and the "reserved for 
further non-control frames" comment (which implies it belongs only to 
those fields), and is missing entirely for frame-opcode-cont. One 
possible solution is to remove both length comments, and just add it 
instead to frame-opcode. Or, add it to the three individual types, but 
as a last comment in the indentation level of all possible values (not 
just reserved). [ The indentation in this email may get screwed up, so 
see the original RFC to see what I mean. ]

8: 5.5.1

   "the endpoint typically echos the status code it received"

   should this be "the status code and reason"?

9: 6.1.1

   "If at any point the state of the WebSocket connection changes, the 
endpoint MUST abort the following steps."

   It isn't entirely clear what "at any point" means here - in between 
the listed steps? within a step? more specifically, in step 7 it 
transmits potentially multiple frames - is that an atomic operation? or 
MUST it abort also in between frames/fragments? or MUST it abort in the 
middle of a single (possibly very long) frame? This has significant 
implications to what happens next e.g. if aborted in the middle of a 
frame, or even in between frames in a multi-frame message, the framing 
protocol is then broken and no close frame can be sent, etc., whereas if 
step 7 is all-or-nothing with regards to the MUST abort, then the 
closing can continue gracefully and no data is lost). This should really 
be clarified, as this is a MUST that can be interpreted in significantly 
different ways.

10: 7.1.7

   "An endpoint MUST NOT continue to attempt to process data (including 
a responding Close frame)"

   So maybe it should say "to process frames" instead of "to process 
data" (since data frames are explicitly defined, use of "data" here is 
confusing. Is a ping frame "data"?).

11: 7.2.2

   "Certain algorithms require or recommend that the server _Abort the
   WebSocket Connection_ during the opening handshake.  To do so, the
   server MUST simply _Close the WebSocket Connection_"

   1 - The term _Abort the WebSocket Connection_ that is defined here 
doesn't appear anywhere else in the RFC. Searching for the word "abort" 
on its own, there is
   "abort these steps", "abort the connection attempt", "abort the 
connection", "abort the WebSocket handshake". It would be useful to use 
the defined term wherever it was intended to be used (I'm not sure which 
of the above), to disambiguate this MUST from coincidental use of the 
word "abort" in the prose.

   2 - More generally, it's not entirely clear why it would be required 
to close the socket in any case if the handshake fails, whether 
according to this section or other sections. To my understanding the 
handshake is still using the HTTP protocol, so sending an error status 
code should be enough, and the client should be free to reuse the 
connection to try again, or to send unrelated non-websocket requests for 
other HTTP resources on the same TCP connection. Specifically, section 
4.4 describes such a scenario for protocol version negotiations - why 
should it be necessary to close the connection in the middle before the 
second request with the corrected version number is sent? (section 
4.2.2, under /version/, requires aborting the handshake).

   3 - Finally, different headers and conditions use different 
terminology in discussing error conditions, instead of being limited to 
the well-defined terms. I think it would be super helpful to make more 
precise use of the overloaded terms 'close', 'fail', 'abort' throughout 
the spec, use the well-defined terms everywhere they are needed and 
avoid using the generic English words where they don't have the same 
well-defined meaning. I get that there was an intention to do so, such 
as in 7.2.2, but it didn't go all the way in being careful how those 
words are used elsewhere, resulting in confusion in what to actually do 
in each error condition (and there are lots of them in the spec).

12: 7.4.1

   "1011 indicates that a server is terminating the connection because
   it encountered an unexpected condition that prevented it from
   fulfilling the request."

   There is no definition or even concept of 'request' in the websocket 
framing protocol (other than the HTTP handshake request, which would 
return an HTTP status code and not this). The websocket protocol is 
about a bi-directional stream of frames and messages, not a 
request-response model.

   I'm guessing maybe this is meant to be some sort of application-level 
thing if an application decides to use a request-response model on top 
of websockets? but that's beyond the scope of this specification. In 
short, I think the terminology and/or intention of when to use this 
status code should be clarified, because as it is currently phrased it 
is undefined.

13: 9.1

   "If a value is received by either the client or
   the server during negotiation that does not conform to the ABNF
   below, the recipient of such malformed data MUST immediately _Fail
   the WebSocket Connection_."

   This is another specific instance of #11 above - it is strange to 
Fail the websocket connection (which entails closing the socket) for a 
malformed HTTP header... if Sec-WebSocket-Extensions does not conform to 
ABNF, shouldn't it just return a HTTP 400 status code and move on to the 
next request? Some headers require disconnecting and some don't... it's 
strange.


I hope this was clear enough, and somebody in the know can help me 
figure these out and/or/if how else to report them to improve the spec.

Many thanks!

Amichai