Re: [conex] Act bits and Positioning (Was Re: Fwd: Review: draft-ietf-conex-destopt-06)

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Tue, 09 September 2014 14:19 UTC

Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: conex@ietfa.amsl.com
Delivered-To: conex@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9EA9F1A0BEA for <conex@ietfa.amsl.com>; Tue, 9 Sep 2014 07:19:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.552
X-Spam-Level:
X-Spam-Status: No, score=-5.552 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-1.652] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H8e9wA9OFRCb for <conex@ietfa.amsl.com>; Tue, 9 Sep 2014 07:19:46 -0700 (PDT)
Received: from smtp.ee.ethz.ch (smtp.ee.ethz.ch [129.132.2.219]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5A52A1A6F9D for <conex@ietf.org>; Tue, 9 Sep 2014 07:19:39 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.ee.ethz.ch (Postfix) with ESMTP id 6B463D930A; Tue, 9 Sep 2014 16:19:37 +0200 (MEST)
X-Virus-Scanned: by amavisd-new on smtp.ee.ethz.ch
Received: from smtp.ee.ethz.ch ([127.0.0.1]) by localhost (.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id dQpJq25Ux-98; Tue, 9 Sep 2014 16:19:37 +0200 (MEST)
Received: from [82.130.103.143] (nb-10510.ethz.ch [82.130.103.143]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: mirjak) by smtp.ee.ethz.ch (Postfix) with ESMTPSA id 05172D9304; Tue, 9 Sep 2014 16:19:37 +0200 (MEST)
Message-ID: <540F0C78.7050309@tik.ee.ethz.ch>
Date: Tue, 09 Sep 2014 16:19:36 +0200
From: =?windows-1252?Q?Mirja_K=FChlewind?= <mirja.kuehlewind@tik.ee.ethz.ch>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: Bob Briscoe <bob.briscoe@bt.com>, Suresh Krishnan <suresh.krishnan@ericsson.com>
References: <201408121058.09210.mirja.kuehlewind@ikr.uni-stuttgart.de> <53EA6068.6090100@tik.ee.ethz.ch> <201408131906.s7DJ6V2s029587@bagheera.jungle.bt.co.uk> <53ECE6C9.40300@tik.ee.ethz.ch> <53ECE917.6000803@tik.ee.ethz.ch> <201408141915.s7EJFVI8000808@bagheera.jungle.bt.co.uk> <53FB741A.9010500@tik.ee.ethz.ch> <201408261727.s7QHRlxB026767@bagheera.jungle.bt.co.uk> <53FF4E3F.4060502@tik.ee.ethz.ch> <201408282005.s7SK5ke4004064@bagheera.jungle.bt.co.uk> <54073A5B.20207@tik.ee.ethz.ch> <201409082217.s88MHFDj018480@bagheera.jungle.bt.co.uk> <E87B771635882B4BA20096B589152EF62882B883@eusaamb107.ericsson.se> <201409090759.s897xriV019964@bagheera.jungle.bt.co.uk>
In-Reply-To: <201409090759.s897xriV019964@bagheera.jungle.bt.co.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/conex/NYC3ATzxlDqps29nj7fwoAcSCzw
Cc: Carlos Ucendo <ralli@tid.es>, ConEx IETF list <conex@ietf.org>
Subject: Re: [conex] Act bits and Positioning (Was Re: Fwd: Review: draft-ietf-conex-destopt-06)
X-BeenThere: conex@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Congestion Exposure working group discussion list <conex.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/conex>, <mailto:conex-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/conex/>
List-Post: <mailto:conex@ietf.org>
List-Help: <mailto:conex-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/conex>, <mailto:conex-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Sep 2014 14:19:52 -0000

Hi,

see inline

On 09.09.2014 09:59, Bob Briscoe wrote:
> Suresh,
>
> At 05:36 09/09/2014, Suresh Krishnan wrote:
>> Hi Bob,
>>   Thanks a lot for your comments. I will respond to two specific issues
>> that you brought up
>>
>> act bits being 01: Please note that the Conex option is a *destination*
>> option. Non conex-aware nodes on path will not even process the option.
>> So we do not need to be worried about the packet being dropped by
>> intermediate nodes, and we need to know if the destination does not
>> understand it. Hence I think this should stay as 01.
>
> I was aware that the act bits are only processed by the destination.
>
> My point was that ConEx only requires sender support (ie. you can have
> ConEx on one half-connection but not the other - there is no need to
> negotiate ConEx for a connection). So it would be very bad for a
> destination to drop a packet just before delivering it to the
> destination process just because it doesn't recognise a ConEx header
> that it doesn't need to understand anyway.
>
> Note to Mirja: Given this misunderstanding, perhaps the draft should
> give a reason:
>          "The act bits MUST be 00, because a ConEx packet needs to be
> passed to the destination process even if the destination does not
> understand ConEx."
I agree with Bob, as the receiver does not need to proceed the option in 
our case at all and it does even know if it has to be there. Suresh?

>
>
>> CDO as the first option: As you rightly note, this is just a performance
>> optimization. I do believe that the performance penalty for conex-aware
>> nodes will be pretty severe if they need to process all destination
>> options before deciding if the CDO is present or not.
>
> Surely a ConEx node can stop when it gets to the CDO option (which will
> usually be first). It doesn't need to continue and process all the other
> options within the destopt header.
>
>> Please note that
>> this needs to be done on *all* packets passing through the conex aware
>> node.
>
> On packets without a destopt that is quick.
>
> The really bad case is for packets from senders that don't support ConEx
> but are using many other destopts. Then the on-path ConEx node would
> walk along every destopt until the end.
>
>> I do think it is OK to change the MUST to a SHOULD but with a
>> severe warning.
>
> OK, thanks.
>
> Would it be OK to say "As an optimization, a ConEx implementation MAY
> limit the depth of its search for CDO to two or three destination options"?
My assumption was that search for the CDO if multiple options are 
present is not feasible in fast path.

So if the CDO is not first, there are two options:
1) forward the packet to slow path
2) ignore the CDO

Actually case 1) is probably no option because this would forward all 
present traffic to slow path because none of the traffic has a CDO at all.

2) is at least not feasible for something like a policer that really 
needs to look at all ConEx-enable packets.

Limiting the search depth, would translate into "the CDO SHOULD be the 
first option and MUST be among the first X options"... is that a solution?


Bob, see further below...

>
>
> I have also added to my response to Mirja below, with an extra thought
> about ESP tunnels (inline - search for 'ESP tunnel' - it's a long way
> down!)...
>
>
>> Cheers
>> Suresh
>>
>> On 09/08/2014 06:17 PM, Bob Briscoe wrote:
>> > Mirja,
>> >
>> > At 16:57 03/09/2014, Mirja Kühlewind wrote:
>> >> Hi again,
>> >>
>> >>
>> >> On 28.08.2014 22:05, Bob Briscoe wrote:
>> >>>>>>>>>>>> * Suggested deleting example of Not-ConEx-capable packets
>> (see
>> >>>>>>>>>>>> separate thread to conex-tcp-modifications authors about
>> TCP pure
>> >>>>>>>>>>>> ACKs).
>> >>>>>>>>>>> I can remove the example but not sure why you are suggesting
>> >>>>>>>>>>> this. If
>> >>>>>>>>>>> you actually imply that the X bit should never be zero
>> that we
>> >>>>>>>>>>> have to
>> >>>>>>>>>>> discuss if the X bit is needed at all.
>> >>>>>>>>>> I have never thought the X flag was needed. There's
>> probably some
>> >>>>>>>>>> email
>> >>>>>>>>>> on the list somewhere in the past from me that says that.
>> >>>>>>>>>>
>> >>>>>>>>>> As I put in one of the comment bubbles:
>> >>>>>>>>>> "The only need I can see for the X-flag is if
>> >>>>>>>>>> the Reserved field gets used in future for
>> >>>>>>>>>> something in addition to ConEx. Then there
>> >>>>>>>>>> would be a need to identify packets that
>> >>>>>>>>>> are not ConEx-capable but still carry the
>> >>>>>>>>>> CDO option (for the new reason)."
>> >>>>>>>>>>
>> >>>>>>>>>> Can anyone think of a use for the X flag?
>> >>>>>>>>> I thought the X bit unset means: I'm a ConEx aware sender and i
>> >>>>>>>>> want to
>> >>>>>>>>> follow the rules but I don't have any feedback for this
>> (control)
>> >>>>>>>>> data
>> >>>>>>>>> so I'm unable to give you useful ConEx information and if
>> you use
>> >>>>>>>>> this
>> >>>>>>>>> packet for your estimation of the current congestion level, you
>> >>>>>>>>> might
>> >>>>>>>>> underestimate it.
>> >>>>>>>>>
>> >>>>>>>>> Doesn't that make sense...?
>> >>>>>>> Not to me. What does "feedback for this (control) data" mean?
>> Feedback
>> >>>>>>> is about a path used by a 5-tuple. This control data is about
>> to be
>> >>>>>>> sent
>> >>>>>>> over such a path. If the sender has feedback about that path, the
>> >>>>>>> feedback applies to everything sent over the path, at the IP
>> layer,
>> >>>>>>> whatever categorisation the next packet has at L4.
>> >>>>>> If you do not get any feedback on a path, e.g. a receiver only
>> sending
>> >>>>>> ACKs, you will never be able to send any ConEx markings. So
>> what's the
>> >>>>>> point about marking a packet as ConEx-enabled?
>> >>>>> OK, this is a good example for when a ConEx-enabled flag might be
>> >>>>> useful. However,...
>> >>>>>
>> >>>>> ...This doesn't justify marking pure ACKs as not-ConEx-enabled.
>> If a
>> >>>>> sender sends a pure ACK now, all it knows is that it might not have
>> >>>>> enough feedback to be able to set ConEx markings on a whole
>> sequence of
>> >>>>> packets later in the flow,... but only if it keeps sending
>> solely pure
>> >>>>> ACKs from now on. However, a sender can't be sure that it won't
>> have
>> >>>>> enough feedback in future, because usually an app (let alone the
>> >>>>> transport layer) cannot predict whether there will be more data
>> to send
>> >>>>> later, even if it's not sending any now.
>> >>>>>
>> >>>>> Once a sender has had no feedback for at least a round trip, it
>> has 2
>> >>>>> options for subsequent packets:
>> >>>>> a) turn off ConEx-enabled;
>> >>>>> b) keep sending packets with ConEx-enabled set, but
>> conservatively add
>> >>>>> some credit.
>> >>>>>
>> >>>>> Even if it subsequently sends some data, it will still have to
>> do (a) or
>> >>>>> (b) on these data packets, at least for one further round trip,
>> until it
>> >>>>> gets the feedback. So this is nothing to do with whether the packet
>> >>>>> being sent is a pure ACK. It is to do with whether feedback has
>> recently
>> >>>>> been received.
>> >>>> Okay, rewrote the paragraph slightly:
>> >>>>
>> >>>> "If the X bit is zero all other three bits are undefined and thus
>> >>>> should be ignored and forwarded unchanged by network nodes. The X
>> bit
>> >>>> set to zero means that the connection is ConEx-capable but this
>> packet
>> >>>> MUST NOT be accounted when determining ConEx information in an audit
>> >>>> function. This can be the case if no feedback on the congestion
>> status
>> >>>> is (currently) available for e.g. for control packets (not carrying
>> >>>> any user data). As an example a TCP receiver that only sends pure
>> ACKs
>> >>>> will usually send them as ACK are usually not ECN-capable as ACK
>> >>>> usually are not ECN-capable and TCP does not have a mechanism to
>> >>>> announce ACK lost. Thus congestion information about ACKs are not
>> >>>> available."
>> >>>>
>> >>>> Is this okay?
>> >>> The main problem is saying 'not available *for* control packets'. But
>> >>> just changing 'for' to 'from' would still make this too unclear to be
>> >>> understood.
>> >>>
>> >>> Also need to:
>> >>> * Make it clear the example is TCP-specific.
>> >>> * Focus on loss first, then ECN.
>> >>> * 'mechanism to announce ACK loss' is not really understandable.
>> >>> * Avoid 'control packets', which is too general, given this is an
>> >>> example, so it can be specific.
>> >>> * Nit: duplicated word (for e.g. for) and duplicated phrase (as
>> ACK are
>> >>> usually not ECN-capable as ACK usually are not ECN-capable).
>> >>>
>> >>> How about:
>> >>>
>> >>> First 2 sentences unchanged, then...
>> >>> "This can be the case if no congestion feedback is (currently)
>> available
>> >>> e.g. in TCP if one endpoint has been receiving data but sending
>> nothing
>> >>> but pure ACKs (no user data) for some time. This is because pure
>> ACKs do
>> >>> not advance the sequence number, so the TCP endpoint receiving them
>> >>> cannot reliably tell whether any have been lost due to congestion.
>> Pure
>> >>> TCP ACKs cannot be ECN-marked either [RFC3168]."
>> >> Fine for me. Done.
>> >>
>> >>
>> >>>>>> Further note, in the TCP mods we only look at the payload
>> because we
>> >>>>>> assume, for simplification, all packets have the same size.
>> Therefore
>> >>>>>> a packet that carries no data would not decrease the CEG/LEG.
>> If ACKs
>> >>>>>> should get marked, we need to rewrite all this stuff in the tcp
>> mods
>> >>>>>> doc...
>> >>>>> I don't think we should avoid changing tcp-mods if its 'not right'.
>> >>>>>
>> >>>>> I hope you see the problem from my explanation above - whether
>> there is
>> >>>>> enough feedback /now/ to ConEx-mark a packet has nothing to do with
>> >>>>> whether the packet being sent /now/ is capable of generating
>> feedback
>> >>>>> /in the next round/.
>> >>>>>
>> >>>>> If you want to make a simplifying assumption, it is on the safe
>> side for
>> >>>>> a sender to assume that all incoming feedback is about packets
>> of the
>> >>>>> same size. It's not safe for a sender to assume that all packets
>> it is
>> >>>>> sending are the same size. Anyway, it knows what size it is
>> sending, so
>> >>>>> it doesn't need this simplification.
>> >>>> Okay, the assumption is (only) that feedback is based on packets
>> that
>> >>>> are the same size. If we send you a packet we of course decrease the
>> >>>> LEG/CEG by the actually payload bytes. But taking this assumption be
>> >>>> simply do not account for headers at all (nor incoming neither
>> >>>> outcoming) because we can anyway just estimated the header bits and
>> >>>> there simply assume it will equal out. Which mean if we send a pure
>> >>>> ACK we will not decrease the LEG/CEG because there are no payload
>> >>>> bytes. I believe that this simplification makes thing much
>> simpler and
>> >>>> is therefore useful but will not allow for marking pure ACKs...
>> >>> I thought the earlier definition said that ConEx accounts for the
>> size
>> >>> of the IP header that contains the CDO and everything within it.
>> Also,
>> >>> there's the TCP header size on a pure ACK.
>> >> Yes, especially when a network node accounts
>> >> ConEx marks. But in the (TCP) sender we just
>> >> don't care about the header bits for
>> >> simplification. We are aware that all bits will
>> >> be accounted but as we assume equal size packets that should be fine.
>> >>
>> >>
>> >>> That's the basis on which I am assuming that pure ACKs are worth
>> >>> counting. A pure ACK will count as at least 86B (and more if there
>> are
>> >>> additional TCP options or IP extensions).
>> >>>
>> >>> IPv6 header: 40B
>> >>> CDO dest opt: 6B
>> >>> TCP header: 40B
>> >>> Total: 86B
>> >>>
>> >>> If there are more IP extensions, I guess it will be hard for TCP
>> to know
>> >>> though.
>> >> Yes, so how should I implement that?
>> > I guess just assume that any IP extensions will
>> > be constant on every packet in a flow, therefore
>> > assuming none will be similar to assuming some.
>> >
>> >>>> You didn't convince me (yet) that this should be changed but this
>> >>>> would need to be changed in the tcp mods doc and not this one
>> anyway.
>> >>> Agreed (that this would affect tcp-mods, not destopt).
>> >>>
>> >>> What is 'this' that you aren't yet convinced by?
>> >> 'this' is the fact that I need to changed
>> >> something in the tcp mods document. I might
>> >> remove the statement (if existent) that control
>> >> packets should be not-ConEx capable but I would
>> >> still like to recommend it because I believe it
>> >> makes things overly complicated otherwise. The
>> >> point is I believe that at the location in the
>> >> (Linux) code where you implement the counting,
>> >> you don't even have the information how large
>> >> the pure ACK will be in the end...
>> > See next comment.
>> >
>> >>>>> The simplification I propose (that feedback is all about the
>> same size
>> >>>>> packets, rather than all the sent packets are the same size) is
>> likely
>> >>>>> to be pretty good, given the receiver doesn't get loss or ECN
>> info about
>> >>>>> pure ACKs, so they are automatically removed from the set of
>> packets
>> >>>>> that the sender assumes to be the same size. And, and if some of
>> the
>> >>>>> feedback is about smaller data packets, at least this
>> simplification
>> >>>>> will always be on the safe side.
>> >>>>>
>> >>>>> If I correctly understand the simplification you propose, a
>> ConEx sender
>> >>>>> will more often under-declare congestion than over-declaring,
>> which is
>> >>>>> not safe.
>> >>>> I don't believe so. Was this just of a different understanding of
>> what
>> >>>> we proposed or can you explain further...?
>> >>> I thought you were proposing that a TCP sender assumes all the
>> packets
>> >>> it sends are full-sized, even if they aren't. But I believe you have
>> >>> said that is not what you proposed.
>> >> No, when reducing the congestion counter(s) we
>> >> use the actual number of payload bytes. We also
>> >> use the real number of acknowledged bytes to
>> >> increase the counter(s). We simply do not care
>> >> about the header bytes at all assuming that on
>> >> average all packets have the same size and
>> >> therefor the number of (marked) header bytes
>> >> (either ECN or ConEx) in total will be about right.
>> > OK. I understand now.
>> >
>> > Ultimately TCP has to put a number in the Data
>> > Offset field, so it has to know the size of its
>> > own header. However, for an initial
>> > (experimental) implementation, if you need your
>> > proposal to assume all TCP options within one
>> > flow are the same size, it would be reasonable
>> > (it's not actually true, e.g. SACK, but there
>> > should at least be no bias, so you will overstate as much as
>> understate).
>> >
>> > You say earlier that it is too complicated to
>> > implement code within TCP that knows the size of
>> > a pure ACK. If TCP code doesn't know the size of
>> > a TCP header, then Linux must be using magic
>> > instead of code. Because, surely, the whole point
>> > of the TCP code is to write a TCP header.
>> >
>> >>>>>>>>>>>> ==Fast-path==
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * CDO as first destination option: changed from MUST to
>> SHOULD
>> >>>>>>>>>>>> (with
>> >>>>>>>>>>>> an example of when not to).
>> >>>>>>>>>>> I believe this really needs to be a MUST. I know that might
>> >>>>>>>>>>> restrict
>> >>>>>>>>>>> the use of ConEx with potential other options that might
>> have the
>> >>>>>>>>>>> same
>> >>>>>>>>>>> requirement (for different reasons). But if you don't put
>> a MUST
>> >>>>>>>>>>> here,
>> >>>>>>>>>>> you cannot implemented the suggested way in the fast path.
>> >>>>>>>>>> A SHOULD still means it will be the first option in all
>> current
>> >>>>>>>>>> implementations. However, I suggest a SHOULD, precisely
>> because
>> >>>>>>>>>> performance reasons are not absolute, so they don't require a
>> >>>>>>>>>> MUST. If
>> >>>>>>>>>> another dest opt cannot work at all unless it is first,
>> that would
>> >>>>>>>>>> be a
>> >>>>>>>>>> valid reason for CDO coming second, because it still works,
>> it's
>> >>>>>>>>>> /just/
>> >>>>>>>>>> slower.
>> >>>>>>>>>>
>> >>>>>>>>>> The IESG will (rightly) be very wary of any draft that says an
>> >>>>>>>>>> option
>> >>>>>>>>>> MUST be the first option.
>> >>>>>>>>>>
>> >>>>>>>>>> I suggested the following text after this: "(This is not
>> >>>>>>>>>> stated as a 'MUST', because some future destination option
>> might
>> >>>>>>>>>> need to
>> >>>>>>>>>> be placed first for functional rather than just performance
>> >>>>>>>>>> reasons.)"
>> >>>>>>>>> So our fast path implementation must simply assume that
>> there is no
>> >>>>>>>>> CDO
>> >>>>>>>>> in case it cannot find it as the first option. Otherwise all
>> >>>>>>>>> non-ConEx
>> >>>>>>>>> packets would need to go to the slow path to make sure there
>> is no
>> >>>>>>>>> ConEx
>> >>>>>>>>> option. That means to me that this must be a MUST...?
>> >>>>>>> OK, I see the problem, but how much of a performance problem
>> would it
>> >>>>>>> really be for the fast path of a ConEx function to step along
>> dest
>> >>>>>>> opts
>> >>>>>>> until it gets to CDO then stops (rather than stop if CDO is not
>> >>>>>>> first)?
>> >>>>>> So that's the different between you looking at one bit at a
>> defined
>> >>>>>> position or having a chain of conditional look-ups where the
>> length is
>> >>>>>> unknown. I believe that is something you would avoid to
>> implement in
>> >>>>>> fast path as the processing time is not fixed anymore... that
>> would be
>> >>>>>> my guess but I'm not an expert in this area.
>> >>>>> AFAICT, fast path implementations generally work along sequences of
>> >>>>> extensions. So I don't think this is a problem. Bear in mind
>> that we are
>> >>>>> not asking general fast path forwarding implementations to do
>> this. Only
>> >>>>> ConEx functions specifically written to find the ConEx
>> header.{Note 1}
>> >>>>>
>> >>>>> {Note 1} OK, we do suggest that general forwarding functions
>> could do
>> >>>>> DoS protection using the ConEx header. But that's stated as
>> optional and
>> >>>>> 'aspirational'. If such an experiment proves useful, you never
>> know,
>> >>>>> there could be demand for ConEx to migrate into the hop-by-hop
>> options
>> >>>>> (according to the v6 spec, hop-by-hop and dest options share the
>> same
>> >>>>> option number space, so this would be a straightforward
>> migration, just
>> >>>>> moving where the CDO is placed, but using the same option number
>> and
>> >>>>> format).
>> >>>> There might be also further use cases for e.g. traffic management or
>> >>>> multipath routing where general forwarding nodes need to access this
>> >>>> information.
>> >>>>
>> >>>> So what's the solution here?
>> >>> I think this will get thrown back by the IESG if we say 'MUST be
>> first'.
>> >>> And I think 'SHOULD be first' is a doable implementation for
>> ConEx-aware
>> >>> nodes. That is sufficient for experimental. Any experiments where
>> >>> general forwarding nodes access ConEx will already be reading a
>> destopt
>> >>> at every hop, which is not what was intended, but it would be doable
>> >>> just for an experiment that wanted to prove ConEx has wider uses.
>> >> I know that this might be a problem with IESG review, but... it's
>> broken...
>> >>
>> >>> Everyone involved in IPv6 knows that the attempt to design
>> extensibility
>> >>> into v6 failed. It won't be news to the IESG that we can't add an
>> >>> extension that can be processed at every hop on the fast path.
>> >>>
>> >>> If a destopt is sufficient to prove ConEx useful, then
>> implementers will
>> >>> want to satisfy this demand. Then
>> >>> * either there is even more pressure on the IETF to address this
>> failing
>> >>> in v6 (and maybe someone will),
>> >>> * or ConEx has to continue with this destopt solution, just like
>> >>> everyone else is finding hacks round this failing in v6.
>> >>>
>> >>> But don't ask me. Ask Suresh.
>> >> Yes! Unfortunately he did not response until
>> >> now. Maybe he is/was on holidays; will ping him again.
>> >>
>> >>
>> >>>>>>> Then "CDO SHOULD be first" would give no different performance
>> to "CDO
>> >>>>>>> MUST be first", if CDO actually was first. If CDO had to be
>> placed
>> >>>>>>> second on a certain packet, "CDO SHOULD be first" would take
>> just one
>> >>>>>>> more op than "CDO MUST be first".
>> >>>>>>>
>> >>>>>>> Note: I've just re-read the spec of the IPv6 header. We need to
>> >>>>>>> specify
>> >>>>>>> that CDO goes in the "Destination Options (before routing
>> header)",
>> >>>>>>> not
>> >>>>>>> the "Destination Options (before upper-layer header)". Then it
>> >>>>>>> won't be
>> >>>>>>> encrypted by an ESP header.
>> >>>>>> Thanks. I wasn't fully aware of this. But the difference for my
>> >>>>>> understanding is if immediate node listed in the routing header
>> should
>> >>>>>> proceed this option or not. In our case it is probably not
>> important
>> >>>>>> which one we choose as it should be processed by none of the
>> receivers.
>> >>>>> You're correct that CDO isn't processed by any of the nodes
>> listed in
>> >>>>> the routing header as destinations. The phrase "before routing
>> header"
>> >>>>> is just how its placement is described. We should clarify that this
>> >>>>> isn't anything to do with the processing of the routing header.
>> >>>>>
>> >>>>>> Where did you read that the later one is not encrypted though?
>> >>>>> ESP encrypts everything after the ESP header, and it comes just
>> before
>> >>>>> the second dest opts. So it would be no good putting CDO after it.
>> >>>>>
>> >>>>> See the ESP spec, on "ESP Header Location":
>> >>>>> <http://tools.ietf.org/html/rfc2406#section-3.1>
>> >>>>> "  The destination options extension header(s) could appear
>> >>>>>     either before or after the ESP header depending on the
>> semantics
>> >>>>>     desired.  However, since ESP protects only fields after the ESP
>> >>>>>     header, it generally may be desirable to place the destination
>> >>>>>     options header(s) after the ESP header.
>> >>>>> "
>> >>>> Thanks. Wasn't able to find this sentence!
>> >>>>
>> >>>>> Also see the IPv6 spec on "Extension Header Order":
>> >>>>> <http://tools.ietf.org/html/rfc2460#section-4.1>
>> >>>>>
>> >>>>> I believe one reason there are two places for the dest opt is
>> because if
>> >>>>> ESP is encrypting everything for the destination, it will
>> normally be
>> >>>>> expected that the dest opts need to be encrypted too. But this
>> wouldn't
>> >>>>> work if you have multiple destinations on the path in the
>> routing header
>> >>>>> (that probably don't hold the relevant key).  Fortunately, this
>> >>>>> exception is also needed for ConEx.
>> >>>>>
>> >>>>>> If so, I can simply add one sentence to the first paragraph of
>> >>>>>> section 4:
>> >>>>>> "The CDO MUST be placed in the destination option before routing
>> >>>>>> header such that it does not get encrypted and can be read by
>> >>>>>> immediate ConEx-aware nodes."
>> >>>>>> And then remove the first paragraph of the IPSec section (and
>> probably
>> >>>>>> move the other paragraph somewhere else so that the section is
>> removed
>> >>>>>> completely)...?
>> >>>>> I've lost track of all the proposed changes to the IPsec
>> section. But I
>> >>>>> think there is value in spelling out exactly how ConEx and IPsec
>> >>>>> interact, so I wouldn't remove the section completely, even if it
>> >>>>> repeats info elsewhere.
>> >>>> Okay I just realized that we recommend to to use TPSec for
>> >>>> authentication but I believe if the ConEx option should not be
>> >>>> encrypted by using the respective header, it will also not be
>> >>>> authenticated...? So you can have either one of the two...? I
>> believe
>> >>>> we still need the IPSec section but right now I'm not sure what to
>> >>>> right in there...? Any proposal?
>> >>> * How to do ConEx when IPsec is also required (tunnel & transport
>> modes,
>> >>> and what to count). This may all be obvious now, but (IMO) it
>> would still
>> >>> be worth spelling out obvious things.
>> >>> * How to use IPsec to protect the integrity of CDO.
>> >> Okay, this is the text now:
>> >>
>> >> "Compatibility with use of IPsec
>> >>
>> >> In IPv6 there are two possible position of a
>> >> Destination Option header, either before the
>> >> Routing header or after the Encapsulating Security Payload (ESP)
>> header.
>> >          BETTER?:
>> > In IPv6 a Destination Option header can be placed
>> > in two possible position in the order of possible
>> > headers, either before the Routing header or
>> > after the Encapsulating Security Payload (ESP) header.
>> >          REASONING:
>> > We are talking about the positions where these
>> > headers /would/ be if they were there - they might not actually be
>> present.
>> >
>> >> If the packet is encrypted using IPSec tunnel
>> >> mode, the CDO MUST be placed in the destination
>> >> option before the Routing header such that it
>> >> does not get encrypted and can be read by immediate ConEx-aware nodes.
>> >          BETTER?:
>> > CDO MUST always be placed in a destination option
>> > header placed before where the routing header
>> > would be. Otherwise, if CDO were placed in the
>> > latter position and an ESP header were used, the CDO would be
>> encrypt the
>> >          REASONING:
>> > (There is no need for it to ever be in the later
>> > position and it's best to always be in the same place.)
>> >
>> >
>> >> Note as the Authentication Header (AH) also only
>> >> protects fields after the AH header, the CDO is not authenticated
>> in this case.
>> > Need to say the encapsulator copies CDO from the
>> > inner IPv6 CDO before encrypting the inner.
>> >
>> > s/read by immediate/read by/
>> >
>> > AH integrity protects the IPv6 header that encapsulates it. ESP does
>> not.
>> >
>> >
>> >> In IPSec transport mode both destination option
>> >> headers can be used, as the CDO is in both cases
>> >> visible to the network. If the transport network
>> >> can not be trusted, the Destination Option
>> >> header after the ESP header SHOULD be used to
>> >> ensure integrity of the ConEx information. If an
>> >> attacker would be able to remove the ConEx
>> >> marks, this could        cause an audit device
>> >> to penalize the respective connection, while the
>> >> sender cannot easily detect that ConEx information is missing."
>> >>
>> >> Does this seem to be right now?
>> > Sorry, this is all wrong. One cannot use ESP to
>> > authenticate or protect the integrity of CDO by
>> > putting CDO after ESP, because ESP would then
>> > encrypt CDO so ConEx-aware nodes would not be
>> > able to read it. CDO always has to precede ESP,
>> > which is why I said CDO MUST always be in the first destopt position.
>> >
>> > If the CDO header needs to be authenticated, AH
>> > can be used as in the second example below. AH
>> > protects the integrity of the whole IPv6 datagram
>> > it is encapsulated by (except non-predictable
>> > mutable fields). AH coverage includes the IPv6
>> > header and extension headers before the AH
>> > header, and everything after the AH header too.
Sorry that's my fault; I thought, (similar to ESP) the authentication 
header would only authenticate headers after the AH. (Checked now with 
rfc4302 that I was wrong.)

>> >
>> > I think it would be worth listing the two or
>> > three example header sequences in the draft, as
>> > below. Headers in [] need not be present. Headers in {} are encrypted.
>> >
>> > Transport mode without the integrity of CDO protected:
>> >    IPv6
>> >    [Hop-by-Hop]
>> >    [Routing]
>> >    Destopt(CDO[,...])
>> >    [Fragment]
>> >    ESP{
>> >      [Destopt]
>> >      Upper-Layer
>> >    }
>> >
>> > Transport mode with the integrity of CDO protected:
>> >    IPv6
>> >    [Hop-by-Hop]
>> >    [Routing]
>> >    Destopt(CDO[,...])
>> >    [Fragment]
>> >    AH
>> >    ESP{
>> >      [Destopt]
>> >      Upper-Layer
>> >    }
>> >
>> >
>> > Tunnel mode:
>> >    IPv6
>> >    [Hop-by-Hop]
>> >    [Routing]
>> >    Destopt(CDO-copy[,...])
>> >    [Fragment]
>> >    ESP{
>> >      IPv6
>> >      Destopt(CDO[,...])
>> >      Transport Payload
>> >    }
>> >
>> > For ESP in tunnel mode, as already stated in the
>> > draft, the tunnel ingress MUST copy the CDO from
>> > the destopt in the inner, then write a copy of
>> > the CDO header into a destopt header in the outer.
>> >
>> > I think this updates RFC2406. However, it is
>> > possible that 2406 already requires an ESP
>> > ingress to copy any extension headers, up to and
>> > including Fragmentation, to the outer. Because
>> > all these headers are designed to be visible to
>> > nodes on the path. Suresh may know this.
>
> I checked overnight and copying extension headers is contrary to the
> IPSec architecture.
>
> <http://tools.ietf.org/html/rfc2401#section-5.1.2.2> "IPv6 -- Header
> Construction for Tunnel Mode" says:
>          "Extension headers  never copied"
>
> On reflection, I don't think we should update RFC2401 for ConEx. If I
> were on the IESG, I would not approve that. IPsec needs to have simple
> rules without exceptions.
>
> When we chose destopt as the mechanism for ConEx, we knew it wasn't
> going to interact well with tunnels. I think the best approach is to say,
>          "Currently, the IPv6 protocol architecture does not provide a
> mechanism for new extension headers to be copied to the outer. Therefore
> ConEx functions will have to search for the CDO option within inner
> headers, and ConEx will not work at all over the extent of an ESP tunnel".

So all in all, this simplifies thing to basically "CDO MUST be placed in 
the destination option header before the AH and/or EPS (if present)."

(+ our text just above on not interacting with tunnel mode)

Right?

Mirja

>
>
> Bob
>
>
>> >
>> > To protect the integrity of the outer IPv6
>> > datagram, including protecting the copy of CDO,
>> > an AH header (not shown) could be added before ESP.
>> >
>> > [A worse alternative (no need to mention this):
>> > If the integrity of CDO but not other headers
>> > needed to be protected, ESP with authentication
>> > enabled could be used, which causes
>> > authentication data to be added at the end of the
>> > payload (not shown). Then, before decapsulation,
>> > the tunnel egress would have to record the value
>> > of CDO-copy. Having decrypted the inner, it could
>> > then check that CDO-copy matched the CDO in the
>> > inner.  However, that would require another
>> > update to RFC2406, so using AH would be
>> > preferable, given we don't want to make ConEx
>> > depend on updating both ends of an ESP tunnel - one end is bad enough.]
>> >
>> > HTH
>> > Sorry for taking so long - I wrote most of this
>> > on a plane on Thu, but left some fact checking
>> > for when I got online, and this is the first chance I've had to get
>> back to it.
>> >
>> > Cheers
>> >
>> >
>> >
>> > Bob
>> >
>> >>>>>>>>> Moreover, isn't this here the same case than with tunneling in
>> >>>>>>>>> general.
>> >>>>>>>>> Only if the node that does the encapsulation is ConEx-aware
>> it can
>> >>>>>>>>> copy
>> >>>>>>>>> the CDO, otherwise it will be not visible anymore.
>> >>>>>>>>>
>> >>>>>>>>> So this should either be a should, or we have to say something
>> >>>>>>>>> like: if
>> >>>>>>>>> the node is ConEx-aware is MUST copy the CDO...?
>> >>>>>>>> And then we can the same thing for tunneling in general...?
>> >>>>>>> That's surely a circular argument. What would make a tunnel
>> endpoint
>> >>>>>>> into a ConEx-aware tunnel endpoint, so that it would have to
>> copy the
>> >>>>>>> CDO? It would only become ConEx-aware if it had code added to
>> look for
>> >>>>>>> the CDO, and why would it have that code added unless it was
>> going
>> >>>>>>> to do
>> >>>>>>> something with CDO? That's why I think my 'MAY copy as a
>> performance
>> >>>>>>> optimisation' formula is the best we can do.
>> >>>>>> What you say above is the point. If the node does not know
>> anything
>> >>>>>> about ConEx, it simple cannot copy the option, which is the
>> case for
>> >>>>>> all currently existent nodes. So we cannot say MUST in general.
>> But if
>> >>>>>> the node does know that ConEx exists for any reason, it really
>> must
>> >>>>>> copy the CDO...? But you right that is a little pathologic. I'm
>> will
>> >>>>>> to change if that helps understanding/is less confusing.
>> >>>>> I think we're talking past each other. Given we cannot copy CDO
>> to the
>> >>>>> outer everywhere, for consistency I don't think that copying CDO
>> to the
>> >>>>> outer at all is a good idea, UNLESS it's done deliberately as
>> part of an
>> >>>>> operator's whole approach to handling ConEx. Ie. tunnel
>> endpoints SHOULD
>> >>>>> NOT copy CDO to the outer by default, but they MAY copy CDO to
>> the outer
>> >>>>> for a specific purpose (e.g. optimisation for ConEx functions
>> elsewhere
>> >>>>> in the same operator's network).
>> >>>> Now understood.
>> >>>>
>> >>>> I've tried to make this point a little more clear, not sure if I
>> >>>> succeeded:
>> >>>> "As with any destination option, an ingress tunnel endpoint will not
>> >>>> natively copy the CDO when adding an encapsulating outer IP
>> header. In
>> >>>> general an ingress tunnel SHOULD not copy the CDO to the outer
>> header
>> >>>> as this would changed the number of bytes that would be accounted.
>> >>>> However, it MAY copy the CDO to the outer in order to facilitate
>> >>>> visibility by subsequent on-path ConEx functions if the tunnel
>> ingree
>> >>>> is aware of these nodes and theses nodes are aware of the tunneling.
>> >>>> This trades off the performance of ConEx functions against that of
>> >>>> tunnel processing. "
>> >>> OK. Rather than implying that equipment has evolved conscious
>> awareness,
>> >>> a better formulation would be something like:
>> >>> "..the configuration of the tunnel ingress and the ConEx nodes is
>> >>> co-ordinated."
>> >>>
>> >>> Nits:
>> >>> s/SHOULD not/SHOULD NOT/
>> >>> s/accounted/counted/
>> >>>    (in English, accounted is not a transitive verb, it has to have
>> 'for'
>> >>> after it)
>> >>> s/ingree/ingress/
>> >>> s/theses/these/
>> >> Done.
>> >>
>> >>
>> >>> We're getting there!
>> >> Yes...!
>> >>
>> >> Mirja
>> >>
>> >>
>> >>> But we really do need Suresh's expert eye on this.
>> >>>
>> >>>
>> >>> Cheers
>> >>>
>> >>>
>> >>> Bob
>> >>>
>> >>>
>> >>>> Mirja
>> >>>>
>> >>>>> HTH
>> >>>>> (Delayed 'cos it was a public holday in the UK yesterday.)
>> >>>>>
>> >>>>>
>> >>>>> Bob
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>> Bob
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> Mirja
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>>> ==Security Considerations==
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * Added lots, all pointers to where security issues are
>> >>>>>>>>>>>> discussed in
>> >>>>>>>>>>>> other places (which is what security directorate
>> reviewers need).
>> >>>>>>>>>>> Okay I can add that if you think it's necessary (I would
>> say it's
>> >>>>>>>>>>> just
>> >>>>>>>>>>> redundant, but you be might right that it just helps the
>> sec dir).
>> >>>>>>>>>> It's not always obvious which aspects relate to security.
>> >>>>>>>>>> Especially
>> >>>>>>>>>> when the security is structural rather than crypto. So I think
>> >>>>>>>>>> these
>> >>>>>>>>>> sentences are useful to sec dir.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>> ==IANA==
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * I think the act bits need to be 00 not 10 to avoid ConEx
>> >>>>>>>>>>>> packets
>> >>>>>>>>>>>> being dropped by non-ConEx nodes (including by non-ConEx
>> >>>>>>>>>>>> receivers)?
>> >>>>>>>>>>>> But I'm willing to be corrected.
>> >>>>>>>>>>> I agree; Will ask Suresh why he has put a 10 though.
>> >>>>>>>>>> Yes, he's the right guy to check with.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Bob
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Mirja
>> >>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regards
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Bob
>> >>>>>>>>>> {Note 1}
>> >>>>>>>>>> For anyone watching on the list, the tentative idea that
>> Mirja has
>> >>>>>>>>>> reminded me of is documented in 11.3.1 of my PhD thesis
>> entitled
>> >>>>>>>>>> "Covert
>> >>>>>>>>>> Markings as a Policer Signal".
>> >>>>>>>>>>
>> >>>>>>>>>> The potential problem: A ConEx policer punishes punishment.
>> If a
>> >>>>>>>>>> congestion policer starts dropping packets because the user
>> has
>> >>>>>>>>>> contributed excessively to congestion, in subsequent rounds
>> the
>> >>>>>>>>>> user
>> >>>>>>>>>> has
>> >>>>>>>>>> to re-echo 'L' markings for the policer drops as well. This
>> can
>> >>>>>>>>>> drive
>> >>>>>>>>>> the policer further into 'debit'. This might make it
>> difficult for
>> >>>>>>>>>> the
>> >>>>>>>>>> user to get out of trouble once she's started getting into
>> trouble.
>> >>>>>>>>>>
>> >>>>>>>>>> The basic idea was that when a congestion policer drops
>> packets
>> >>>>>>>>>> (because
>> >>>>>>>>>> the user is causing more congestion than her allowance), it
>> will
>> >>>>>>>>>> also
>> >>>>>>>>>> remove ConEx markings. Then (if there is some way for the
>> >>>>>>>>>> receiver to
>> >>>>>>>>>> feed this back), the sender knows not to send more ConEx marks
>> >>>>>>>>>> because
>> >>>>>>>>>> these aren't congestion drops, they are policer drops.
>> >>>>>>>>>>
>> >>>>>>>>>> We didn't that double punishment made it hard to get out of
>> >>>>>>>>>> trouble in
>> >>>>>>>>>> any policer experiments so far, so let's not allow for a
>> possible
>> >>>>>>>>>> solution to a problem that we probably don't even have. The
>> current
>> >>>>>>>>>> crop
>> >>>>>>>>>> of ConEx drafts are experimental anyway. If this problem does
>> >>>>>>>>>> surface,
>> >>>>>>>>>> then we can reconsider.
>> >>>>>>>>>>
>> ________________________________________________________________
>> >>>>>>>>>> Bob
>> Briscoe,                                                  BT
>> >>>>>>>> --
>> >>>>>>>> ------------------------------------------
>> >>>>>>>> Dipl.-Ing. Mirja Kühlewind
>> >>>>>>>> Communication Systems Group
>> >>>>>>>> Institute TIK, ETH Zürich
>> >>>>>>>> Gloriastrasse 35, 8092 Zürich, Switzerland
>> >>>>>>>>
>> >>>>>>>> Room ETZ G93
>> >>>>>>>> phone: +41 44 63 26932
>> >>>>>>>> email: mirja.kuehlewind@tik.ee.ethz.ch
>> >>>>>>>> ------------------------------------------
>> >>>>>>> ________________________________________________________________
>> >>>>>>> Bob Briscoe,                                                  BT
>> >>>>>> --
>> >>>>>> ------------------------------------------
>> >>>>>> Dipl.-Ing. Mirja Kühlewind
>> >>>>>> Communication Systems Group
>> >>>>>> Institute TIK, ETH Zürich
>> >>>>>> Gloriastrasse 35, 8092 Zürich, Switzerland
>> >>>>>>
>> >>>>>> Room ETZ G93
>> >>>>>> phone: +41 44 63 26932
>> >>>>>> email: mirja.kuehlewind@tik.ee.ethz.ch
>> >>>>>> ------------------------------------------
>> >>>>> ________________________________________________________________
>> >>>>> Bob Briscoe,                                                  BT
>> >>>> --
>> >>>> ------------------------------------------
>> >>>> Dipl.-Ing. Mirja Kühlewind
>> >>>> Communication Systems Group
>> >>>> Institute TIK, ETH Zürich
>> >>>> Gloriastrasse 35, 8092 Zürich, Switzerland
>> >>>>
>> >>>> Room ETZ G93
>> >>>> phone: +41 44 63 26932
>> >>>> email: mirja.kuehlewind@tik.ee.ethz.ch
>> >>>> ------------------------------------------
>> >>> ________________________________________________________________
>> >>> Bob Briscoe,                                                  BT
>> >> --
>> >> ------------------------------------------
>> >> Dipl.-Ing. Mirja Kühlewind
>> >> Communication Systems Group
>> >> Institute TIK, ETH Zürich
>> >> Gloriastrasse 35, 8092 Zürich, Switzerland
>> >>
>> >> Room ETZ G93
>> >> phone: +41 44 63 26932
>> >> email: mirja.kuehlewind@tik.ee.ethz.ch
>> >> ------------------------------------------
>> > ________________________________________________________________
>> > Bob Briscoe,                                                  BT
>> >
>> >
>
> ________________________________________________________________
> Bob Briscoe,                                                  BT
>

-- 
------------------------------------------
Dipl.-Ing. Mirja Kühlewind
Communication Systems Group
Institute TIK, ETH Zürich
Gloriastrasse 35, 8092 Zürich, Switzerland

Room ETZ G93
phone: +41 44 63 26932
email: mirja.kuehlewind@tik.ee.ethz.ch
------------------------------------------