Re: [rohc] TCP/IP EPIC profile

"West, Mark (ITN)" <mark.a.west@roke.co.uk> Wed, 13 March 2002 07:52 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA29877 for <rohc-archive@odin.ietf.org>; Wed, 13 Mar 2002 02:52:16 -0500 (EST)
Received: from optimus.ietf.org (localhost [127.0.0.1]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id CAA26816; Wed, 13 Mar 2002 02:44:46 -0500 (EST)
Received: from ietf.org (odin [132.151.1.176]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id CAA26787 for <rohc@ns.ietf.org>; Wed, 13 Mar 2002 02:44:43 -0500 (EST)
Received: from rsys000a.roke.co.uk (rsys000a.roke.co.uk [193.118.201.102]) by ietf.org (8.9.1a/8.9.1a) with SMTP id CAA29813 for <rohc@ietf.org>; Wed, 13 Mar 2002 02:44:40 -0500 (EST)
Received: by rsys001a.roke.co.uk with Internet Mail Service (5.5.2653.19) id <1XV9BMPR>; Wed, 13 Mar 2002 07:42:33 -0000
Received: from roke.co.uk (ras_fennel2.roke.co.uk [193.118.206.44]) by rsys002a.roke.co.uk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GZCWHGQA; Wed, 13 Mar 2002 07:42:29 -0000
From: "West, Mark (ITN)" <mark.a.west@roke.co.uk>
To: "Hongbin Liao (Intl Staffing)" <i-hbliao@microsoft.com>
Cc: Julije Ozegovic <julije@fesb.hr>, rohc <rohc@ietf.org>
Message-ID: <3C8E88DB.3010102@roke.co.uk>
Date: Tue, 12 Mar 2002 23:01:47 +0000
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2
X-Accept-Language: en-us
MIME-Version: 1.0
Subject: Re: [rohc] TCP/IP EPIC profile
References: <F4C77846CEE593418BE5AB7B6A83111E046521AE@bjs-msg-01.fareast.corp.microsoft.com>
Content-Type: multipart/mixed; boundary="------------InterScan_NT_MIME_Boundary"
Sender: rohc-admin@ietf.org
Errors-To: rohc-admin@ietf.org
X-Mailman-Version: 1.0
Precedence: bulk
List-Id: Robust Header Compression <rohc.ietf.org>
X-BeenThere: rohc@ietf.org

Hi Hongbin,

I'm trying to keep the size of the thread (nearly) manageable!  But I 
think I've rambled on even more than usual, so apologies for that...

Having re-read my ramblings, I'm not sure that the examples are as clear 
as they could be.  If this doesn't make much sense then please let me 
know and I'll try and come up with a better example.

Anyway - on with the comments

Cheers,

Mark.


 >
 > Yeah, as we agree that TCP is a protocol with complex behavior. In 
average, the complex behavior of a protocol tends to need a complex 
profile to accurately (or exactly) describe it. However, considering 
that we need to target at the tradeoff between complexity and 
performance, we just need correctly (not necessarily accurately or 
exactly) capture the most important behavior of the protocol, which may 
not necessarily correspond to a complex profile.
 >

[Mark]
This is a very good point - you are right to remind me that we need to 
capture the aspects of the behaviour that give us the most obvious 
compression gains.

 >
 > Using the shortest packet for the most frequent case, it should be 
the main guideline of the design of header compression packet format. In 
fact, EPIC-LITE is also designed on it. We noticed that many protocols 
have some correlations between different fields (if field A can be 
encoded using XXX, then field B can only be encoded using YYY, or etc). 
For example, in TCP, some tcp options, such as MSS, SACK-Permited, 
Window-Scale, can only appear in SYN packet, SACK options can only 
appear if ACK flag is set; in SCTP, there is weak correlation between 
TSN & Seq; in RTP, Sequence Number & Timestamp have some weak 
correlation too. There are not only correlation between the encode 
methods of two different fields, there may also correlation between the 
encode method of a field and the value of another field. At least, we 
need mechanism to describe such dependency correctly. Otherwise, we may 
not correctly describe the behavior of some fields. Thus, it may not be 
reasonable to make the assumption that all the packet fields are 
independent.
 >

[Mark]
Again, I agree entirely.  Take a look at the RTP profile that we've done 
for another example (draft-surtees-rtp-epic-00.txt, I think).  This 
shows how the correlation (actually quite strong) between SN and TS can 
be captured.

There are some other examples of encoding this sort of thing below.

It is perfectly possible cope with options that only appear in the SYN 
packet, for example (since the ACK flag has to be set on every packet 
other than the intiating SYN, I suggest that the SACK/ACK dependency is 
too much complexity for too little gain...)

If you consider the usage of LIST encoding, then you will see that the 
important thing from this perspective is the presence and order 
information.  Clearly these can be restricted from all possible 
combinations in certain packet formats.  That is, a packet with the SYN 
flag set can allow the 'SYN only' options to appear; otherwise they 
could be less likely / not allowed.

(In this particular instance, I would urge a degree of caution, since 
there has been some discussion about allowing 'negotiation' of some 
options in packets other than the SYN.  [I don't think that this affects 
your example, but means that one should not assume that TCP timestamps 
will always appear, for example]...)

 >>
 >>It is quite fair to say that the fundamental starting point
 >>of the EPIC
 >>approach is that fields are considered independently.  However, it is
 >>quite wrong to say that it is not possible to account for dependent
 >>field behaviour.
 >>
 >>
 >
 > Well, how to describe the dependency between different fields then? 
Will you increase the complexity of the profile for that? At least, from 
EPIC-LITE draft and the examples in EPIC-LITE and EPIC TCP/IP profile, I 
can not find such a method.

[Mark]
I'll try and give some examples towards the end of the mail -- it is 
trivial to capture dependencies.

It's not entirely clear to me what you mean by 'complexity' -- do you 
just mean 'a bigger profile', or do you mean 'add more structure'?  The 
latter is, I think, inevitable when introducing these dependencies. 
However, from your previous points, we should focus on those 
dependencies that matter.  So, as ever, it's a trade-off.  Overall I 
hope that the profile will not be too complex (and a large profile is 
not necessarily complex, since it is processed by repeated application 
of the same simple rules).

 >
 > En, the 30 fields example is used just for discussing another issue 
which is irrelevant to the issue here.
 >

[Mark]
Fair enough -- mentioning this was probably confusing and unhelpful. 
I'll stick to real protocols from now on...

 >
 >
 > How can u identify a flow is 'interactive traffic', 'bulk data flow' 
and 'bulk ack flow' then? Even you can identify it, you still need extra 
overhead to tell decompressors which case the packet is. Meanwhile, if 
we consider other possible dependence for other fields, should we 
generate more format sets? That may make people confusing.
 >

[Mark]
Again, I'm not sure whether we are confusing what we mean by 
'complexity', which I don't think helps!  However, I'll try and answer 
your question and see what happens...

So, let us think about writing some packet formats (we can even ignore 
EPIC for this bit).

Let's take a really simple classification of TCP flows into only 2 types 
'data' and 'ack' flow (forget the 'interactive' one for now).  As you 
point out, the difference between these is that in each case one of the 
sequence or ack number will tend to move by a relatively large amount 
and the other will stay quite stable.

First of all, just by accepting this classification I have added 
'complexity' to my compression solution.  Why do I say that?  Because 
there is now a choice at the compressor.  This choice has to be made at 
the compressor (somehow).  Plus, this choice has to be signalled to the 
decompressor.  Every choice we add increases complexity.  Whatever 
happens, now, your compressor is going to have make a decision as to 
whether the flow 'looks like' a 'data' flow or an 'ack' flow.

There are 2 fundamental choices for dealing with this choice:

1/ I could introduce a flag indicating which bits represent the sequence 
number and which bits represent the ack value (for example).  This would 
be present in every packet.  Since we would not expect this value to 
change over the lifetime of the flow, this seems to be unncessarily 
wasteful.  (In this case -- but see later for another example)

2/ I could have a context value indicating how the bits are used and 
signal this value only when it changes.  This, in essence, is what the 
FORMAT encoding does.  Signalling this change, of course, is not 'extra' 
overhead.  It's actually just information that is only sent when it is 
needed.  On average, for this type of behaviour, it gives the best result.

How the compressor chooses which FORMAT to use is entirely up to the 
local implementation.  (Which is clearly also true for the flags that 
fulfill a similar role in ROHC RTP.  It doesn't matter which format is 
used so long as it transparently and robustly conveys the header 
information.  The better the choice, the more efficient the compression...)

 >
 > Yeah, however, the profile should not produce the wrong predication 
of the most frequent-occured case of TCP/IP packet format.
 >

This is self-evident.

 >
 >>It is disingenuous to suggest that this is an EPIC specific issue,
 >>though!  Any set of packet formats that are written out make
 >>an implicit
 >>statement about the relative probabilities of certain
 >>occurences.  (This
 >>is certainly true of RFC 3095...)
 >>
 >>
 >
 > Well, this is debatable. RFC 3095 can support the format which use a 
tag to implicitly indicate whether a field may exist or not (if field A 
is encoded XXX, then field B is encoded YYY). However, if EPIC assumes 
that packet fields are independent, it's hard to express these 
correlations correctly.
 >

[Mark]
Ok, see my discussion above for one way to support such a flag.

Also, consider the following fragment of EPIC-speak...

A_and_B = AB_1(50%) | AB_2(50%)

AB_1 = XXXX
        YYYY

AB_2 = PPPP
        QQQQ

which says that if A is encoded as XXXX then B is encoded as YYYY (and 
this happens 50% of the time), but if A is encoded as PPPP then B is 
encoded as QQQQ (and this happens 50% of the time).

This is identical to the tag case you describe.

Either of these methods (this one or FORMAT) may be used.
FORMAT is great for when the encoding doesn't change frequently and the 
percentage choice is not known a priori (like the TCP flow 'type').
The form works well where fields are linked with a known dependence (see 
the RTP profile for this sort of thing...)

I consider neither to be complex.

(The sharp-eyed may have already spotted that the encoding just 
described is equal to case [1] in the TCP flow example.  The 'flag' is 
simply encoded into the indicator flags that signal the packet type).

 >
 > For header compression efficiency, the main issue for packet format 
is whether the most frequent cases are identified correctly and encoded 
using the shortest packet. As to how accurate (or exact) the protocol 
behavior is studied, at least, it should correctly identify the most 
frequent-occurred protocol behaviors.
 >

Since we appear to have agreed this point at least 5 times in this 
e-mail alone, I think that we can probably take it as read...


-- 
Mark A. West, Consultant Engineer
Roke Manor Research Ltd., Romsey, Hants.  SO51 0ZN
Phone +44 (0)1794 833311   Fax  +44 (0)1794 833433

(Yes, I do know that my disclaimer is in an attachment.  And, no, I
didn't ask for it to be that way)