Re: [Pce] Lars Eggert: Discusses and Comments on draft-ietf-pce-pcep

Hi,

On 2008-5-25, at 3:59, ext Adrian Farrel wrote:
>> Section 6.3., paragraph 0:
>>> 6.3.  Keepalive Message
>>
>> DISCUSS: I have a few suggestions on improving the keepalive
>> mechanism.
>
> You have raised this as a Discuss, but you phrase it as  
> "suggestions." Please clarify whether for you it is imperative that  
> the Keepalive process is updated as you suggest, or whether these  
> may be treated as suggestions.

I should have phrased this better. My discuss is on the currently- 
specified keepalive mechanism, which keeps generating keepalives  
absent any indication that the peer is currently receiving them and in  
a way that is inefficient compared to a traditional dummy request- 
response exchange that triggers when the deadtimer expires. (TCP  
guarantees delivery, so sending keepalives at a rate higher than the  
deadtimer doesn't add value - they can't get lost.)

In order to make this discuss actionable, I've suggested an  
alternative design, but I'd be fine with any other alternative that  
doesn't suffer from these issues. (Another such example would be to  
use TCP-level keepalives, which although not part of the spec are  
implemented by all platforms I know.)

I'm wondering if our different views stem from different views on what  
this mechanism is supposed to be good for. In my mind, keepalives are  
meant to let an end-point eventually discover non-responding  
connections. Is PCEP using keepalives for purposes that require more  
timely responses, such as failover? If so, TCP may simply be the wrong  
underlying protocol for PCEP, and SCTP may be a better match, given  
its design goal of supporting fail-over.

>> (1) Both the PCC and PCE are allowed to send keepalives
>> according to their timers. This wastes bandwidth. The reception of a
>> keepalive request from the other end should restart the keepalive
>> timer on the receiving end - the reception is an indication that the
>> peer is alive. This means that keepalives will usually only be sent
>> from one end (the one with the shorter keepalive timer) and responded
>> to by the other end.
>
> Please note that Keepalive messages are not responded.
> They are sent to the receiver according to the frequency specified  
> by the receiver. Thus the Keepalive may be unidirectional, or may be  
> very unbalanced according to the requirements of the peers.

Thanks for the clarification. I misunderstood this when reading the  
spec. I still believe that this keep-alive design has undesirable  
features. A more traditional empty request-response exchange doesn't  
suffer from these drawbacks.

>> (2) As long as a keepalive has not been responded
>> to, a PCEP speaker MUST NOT send another one. TCP is a reliable
>> protocol and will deliver the outstanding keepalive when it can. It  
>> is
>> not lost, and there is no need to resend it. All that sending more
>> keepalives does when there is no response if fill up the socket send
>> buffer.
>
> The filling up of the socket send buffer (which might happen through  
> repeated sends of a Keepalive message that cannot be delivered)  
> seems a little improbable. The PCEP messages are four bytes long and  
> will carry the normal TCP/IP headers. It is true that an  
> implementation that opts to not receive Keepalives (or one that must  
> send them far more often than it must receive them) runs the risk of  
> not noticing a failed TCP connection and continuing to send  
> Keepalives until TCP itself eventually reports the problem. It seems  
> to us that this is an extreme fringe condition that can be protected  
> against by proper configuration of the protocol where the issue is  
> believed to be real.

This depends on the keepalive frequency and the send buffer in use, as  
well as the duration of a connectivity disruption. I agree that this  
can be engineered away for most rational cases, but why do so when the  
issue can be completely eliminated through a more traditional  
keepalive scheme?

>> Section 7.3., paragraph 10:
>>> Keepalive (8 bits): maximum period of time (in seconds) between two
>>> consecutive PCEP messages sent by the sender of this message.  The
>>> minimum value for the Keepalive is 1 second.  When set to 0, once  
>>> the
>>> session is established, no further Keepalive messages are sent to  
>>> the
>>> remote peer.  A RECOMMENDED value for the keepalive frequency is 30
>>> seconds.
>>
>> DISCUSS: 1 second is extremely short. If there is a requirement that
>> PCEP detect failures on such short timescales and should fail over in
>> some way, TCP is the wrong underlying transport protocol. If that's
>> the motivation, PCEP should use SCTP, which was specifically designed
>> for this case. Otherwise, I'd suggest 30 seconds as a reasonable
>> minimum to use, and something larger as a recommended default.
>
> The timer range is designed to *allow* an operator to handle a  
> special case deployment where a very short timer is needed, but  
> *recommends* a default value of 30 seconds. Further, it allows the  
> operator to set a much larger value.
>
> I don't see this as any issue at all and no reason to change the  
> transport protocol.

I understand that 1 is the minimum and 30 the default. Under what  
conditions would 1 second be a reasonable interval? If failover on  
short timescales is the goal, SCTP provides mechanisms for that.

>> Section 9.1., paragraph 1:
>>>   PCEP uses a well-known TCP port. IANA is requested to assign a  
>>> port
>>>   number from the "System" sub-registry of the "Port Numbers"  
>>> registry.
>>
>> DISCUSS: Why is a system port (0-1023) required, wouldn't a  
>> registered
>> port (1023-49151) suffice?
>
> Is this not a security issue? On most systems, system ports can only  
> be used by system (or root) processes or by programs executed by  
> privileged users. Registered ports can, on most systems, be used by  
> ordinary user processes or programs executed by ordinary users.  
> Thus, the use of a registered port for PCEP helps to stop a rogue  
> program on a PCE or a PCC from impersonating the server/client.

A system port is not a security mechanism - attackers can as easily  
send from low port numbers as from high ones.

> One might as well ask why HTTP uses port 80, or BGP port 179.

It's an historic artifact that had its origins in trying to prevent  
regular users from running a Unix "system" service. Current security  
practice is to sandbox servers and not run them as root.

Note that the IANA port alocation guidelines are under revision, and  
draft-cotton-tsvwg-iana-ports raises the bar on obtaining one (we're  
running out). This is why I'd like to see some stronger argument for  
allocating a low port:

4.2.  Well Known (System) Ports

    The Well Known Ports are assigned by IANA and cover the range  
0-1023.
    On many systems, they can only be used by system (or root) processes
    or by programs executed by privileged users.

    Registration requests for a Well Known port number MUST follow the
    "IETF Review" policy of [I-D.narten-iana-considerations-rfc2434bis].
    Registrations for a port number in this range MUST document why a
    port number in the Registered Ports range will not fulfill the
    application needs.  Registrations requesting more than a single port
    number for a single application in this space SHOULD be denied.

>> Section 10.1., paragraph 1:
>>> It is RECOMMENDED to use TCP-MD5 [RFC1321] signature option to
>>> provide for the authenticity and integrity of PCEP messages.  This
>>> will allow protecting against PCE or PCC impersonation and also
>>> against message content falsification.
>>
>> DISCUSS: Given all the issues with the continued use of TCP-MD5, I'm
>> not convinced that we really want to recommend its use for a new
>> protocol. Wouldn't draft-ietf-tcpm-tcp-auth-opt be the preferred
>> alternative? Or TLS, since confidentiality is of key importance
>> according to Section 10.2. (Also, nit, TCP-MD5 protects TCP segments
>> and not PCEP messages.)
>
> See separate email on this.
> Authors will look at TLS and fix the nit.
> draft-ietf-tcpm-tcp-auth-opt is not available to use as a normative  
> part of this protocol.

I think that a way forward would be to normatively refer to TCP-MD5  
and say something to the effect that implementors are cautioned that  
TCP-MD5 is about to be obsoleted by a new IETF recommendation (TCP-AO)  
and that they should be prepared to update their implementations when  
this happens.

The IESG might need to do an exception procedure similar to what we  
die for BGP's use of TCP-MD5; I'm hoping that the SEC ADs will chime in.

>> Section 4.2.1., paragraph 4:
>>> Successive retries are permitted but an implementation should make
>>> use of an exponential back-off session establishment retry  
>>> procedure.
>>
>> s/should make/SHOULD make/
>
> Section 4 provides an architectural overview and not normative  
> protocol definition. The use of RFC2119 language would be  
> inappropriate.

OK, but in that case please describe this somewhere in the normative  
part - I couldn't find it there.

>> Section 4.2.2., paragraph 4:
>>> Once the PCC has selected a PCE, it sends the PCE a path computation
>>> request to the PCE (PCReq message) that contains a variety of  
>>> objects
>>> that specify the set of constraints and attributes for the path to  
>>> be
>>> computed.
>>
>> Can a PCC send a second path computation request over the same TCP
>> connection to a PCE when the answer to an earlier one is still
>> outstanding? Can multiple TCP connections exist between the same PCC
>> and PCE?
>
> Yes, multiple PCEP requests may be outstanding from the same PCC at  
> the same time.
> No, multiple 'parallel' TCP connections must not be used, and a  
> specific error code exists to accompany the rejection of the second  
> connection.

OK. Could you explicitly say so in the document? I gathered that was  
implicitly what the connection handling was supposed to result in;  
spelling it out wood be good IMO.

>> Section 4.2.4., paragraph 1:
>>> There are several circumstances in which a PCE may want to notify a
>>> PCC of a specific event.  For example, suppose that the PCE suddenly
>>> gets overloaded, potentially leading to unacceptable response times.
>>
>> Can such notifications occur at any time, i.e., while another message
>> is being sent? If so, how are they framed within the TCP byte stream?
>
> OK, "at any time" would be an exaggeration. It is also impossible to  
> send them in the past.

I meant if they can happen while another message is being sent, and if  
the notification is interleaved into the TCP byte stream (requiring  
some application-level framing) or if the application will queue it  
until an ongoing transmission has ended. I gather it's the latter that  
is supposed to happen. Could you make this explicit in the document,  
i.e., that PCE messages are transmitted over TCP in a sequential order  
without the possibility for interleaving?

>> Section 7.3., paragraph 13:
>>> A sends an Open message to B with Keepalive=10 seconds and
>>> Deadtimer=30 seconds.  This means that A sends Keepalive messages  
>>> (or
>>> ay other PCEP message) to B every 10 seconds and B can declare the
>>> PCEP session with A down if no PCEP message has been received from A
>>> within any 30 second period.
>
> [Editors: Please note s/ay/any/]
>
>> I'd be nice if the example followed the recommended values/fomulas
>> above and used Keepalive=30 and DeadTimer=4*Keepalive (or whatever  
>> the
>> defaults will be after addressing my comments above.)
>
> Yes, this is a good point.
> It should be easy to change this to 30 and 120 seconds.

OK

>> Section 7.3., paragraph 14:
>>> SID (PCEP session-ID - 8 bits): unsigned PCEP session number that
>>> identifies the current session.  The SID MUST be incremented each
>>> time a new PCEP session is established and is used for logging and
>>> troubleshooting purposes.  There is one SID number in each  
>>> direction.
>>
>> What's the start value? Is it incremented for each connection to any
>> PCEP peer or only for connections to the same PCEP peer? Does it
>> matter when SID rolls over? The document doesn't discuss what the SID
>> is used for at all.
>
> I asked similar questions during WG last call.
>
> The answers are:
> Start where you like, the value is not important for the protocol.
> The requirement is that the SID is 'sufficiently different' to avoid  
> confusion between instances of sessions to the same peer.
> Thus, "incremented" is more like implementation advice than a strict  
> definition. In particular, incremented by 255 would be fine :-)
> However, the usage (for logging and troubleshooting) might suggest  
> that incrementing by one is a helpful way of looking at things.
> SID roll-over is not particularly a problem.
>
> Implementation could use a single source of SIDs across all peers,  
> or one source for each peer. The former might constrain the  
> implementation to only 255 concurrent sessions. The latter  
> potentially requires more state.

Thanks for the clarification. It might be useful if a bit of this  
explanation was added to the document. Also, please have it say that  
the SID SHALL only be used for logging and troubleshooting, in order  
to avoid having implementors start using it creatively.

>> Section 7.4.1., paragraph 14:
>>> Request-ID-number (32 bits).  The Request-ID-number value combined
>>> with the source IP address of the PCC and the PCE address uniquely
>>> identify the path computation request context.  The Request-ID- 
>>> number
>>> MUST be incremented each time a new request is sent to the PCE.  The
>>> value 0x0000000 is considered as invalid.  If no path computation
>>> reply is received from the PCE, and the PCC wishes to resend its
>>> request, the same Request-ID-number MUST be used.  Conversely,
>>> different Request-ID-number MUST be used for different requests sent
>>> to a PCE.  The same Request-ID-number MAY be used for path
>>> computation requests sent to different PCEs.  The path computation
>>> reply is unambiguously identified by the IP source address of the
>>> replying PCE.
>>
>> It's redundant to identify requests by source and destination IP
>> address, given that those are constant for requests going over the
>> same TCP connection. Likewise, replies are implicitly identified by
>> the TCP connection they arrive over.
>
> OK. The text could have said that the requests are uniquely  
> identified by the combination of TCP connection and request ID.  
> Since, as you point out, TCP connection is isomorphic to source/dest  
> IP addresses, the text is accurate and not redundant.
>

OK

>> Section 8.3., paragraph 1:
>>> PCEP includes a keepalive mechanism to check the liveliness of a  
>>> PCEP
>>> peer and a notification procedure allowing a PCE to advertise its
>>> congestion state to a PCC.
>>
>> s/congestion/overload/ for consistency, here and throughout the
>> document
>
> Yes. Good catch.

OK

>> Section 9.1., paragraph 1:
>>> PCEP uses a well-known TCP port. IANA is requested to assign a port
>>> number from the "System" sub-registry of the "Port Numbers"  
>>> registry.
>>
>> Does "uses a well-known TCP port" mean that messages from the PCC to
>> the PCE must come from that registered source port, or can they come
>> from any port? (The former implies that only a single PCEP connection
>> can exist between a PCC and a PCE. It also weakens security a bit,
>> because an attacker doesn't need to guess the source port anymore.)
>
> Yes.

"Yes" as in "source port MUST be the PCE port"? (As must the  
destination port, obviously.) If so, you should make this explicit in  
the document, because the default behavior of operating systems is to  
dynamically assign a random high port number as a source port, unless  
an app specifically requests otherwise.

> Only one connection between a PCC and a PCE at any time. No need for  
> more than one has been identified. It might be claimed that two PCC  
> processes might exist on a single host/router, but no usage scenario  
> has been found. Further, the text currently bans a second  
> simultaneous connection between peers.

OK. This came up earlier in this email; I suggest making this explicit.

> Security is weaker and stronger.
> Weaker as you point out.
> Stronger because of the use of a system port as previously described.

Disagree that a system port is a security mechanism, as per above.

> It is debatable whether port number knowledge makes DoS any more  
> significant since the TCP stack may be attacked simply by a  
> connection-attempt storm on any port.

SYN floods can be mitigated through SYN cookies (RFC4987), to a large  
degree at least.

>> Section 10.3.1., paragraph 2:
>>> o  A PCE should avoid promiscuous TCP listens for PCEP TCP  
>>> connection
>>>   establishment.  It should use only listens that are specific to
>>>   authorized PCCs.
>>
>> Authorized by what? TCP has no feature to restrict listens based on
>> credentials.
>
> See below.
>
>> Section 10.3.1., paragraph 4:
>>> o  The use of access-list on the PCE so as to restrict access to
>>>   authorized PCCs.
>>
>> Is redundant with the first bullet (or I don't understand what it
>> means).
>
> Yes. My reaction to your previous comment was to say that the  
> authors mean access lists. In which case this is redundant.
>
> Authors to clarify.

Please do.

>> Appendix A., paragraph 46:
>>> If the system receives an Open message from the PCEP peer before the
>>> expiration of the OpenWait timer, the system first examines all of
>>> its sessions that are in the OpenWait or KeepWait state.  If another
>>> session with the same PCEP peer already exists (same IP address),
>>> then the system performs the following collision resolution
>>> procedure:
>>
>> The goal of this procedure seems to be to guarantee that there is  
>> only
>> a single active PCEP connection between two peers, but it's
>> cumbersome. It'd be much easier to require a peer to not initiate a
>> connection to a peer it already has one established with, and to
>> require it to immediately close a new TCP connection coming from a
>> peer it has an active PCEP connection with. This handles everything  
>> at
>> the TCP layer without needing to involve the PCEP state machine.
>
> I agree that this non-normative appendix seems to include overkill  
> for an unlikely scenario.
>
> As you have previously observed, the use of a well-known port  
> reduces the possibility for parallel connections. Thus it would  
> simply not be possible for the situation to arise.
>
> But, multiple IP addresses can be (often are) assigned to routers  
> giving the possibility for multiple connections that we cannot ask  
> the TCP stack to police. The problem can only be resolved at the  
> PCEP level by identifying the two sessions with the same peer.

Understood, but the text in Appendix A doesn't help in this scenario,  
because it identifies peers by IP address - it won't catch the case  
where two peers talk across two different IP address pairs.

Lars
_______________________________________________
Pce mailing list
Pce@ietf.org
https://www.ietf.org/mailman/listinfo/pce