Re: [rohc] TCP/IP EPIC profile

"West, Mark (ITN)" <mark.a.west@roke.co.uk> Tue, 12 March 2002 07:10 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA16696 for <rohc-archive@odin.ietf.org>; Tue, 12 Mar 2002 02:10:16 -0500 (EST)
Received: from optimus.ietf.org (localhost [127.0.0.1]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id CAA23432; Tue, 12 Mar 2002 02:06:30 -0500 (EST)
Received: from ietf.org (odin [132.151.1.176]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id CAA23402 for <rohc@optimus.ietf.org>; Tue, 12 Mar 2002 02:06:24 -0500 (EST)
Received: from rsys000a.roke.co.uk (rsys000a.roke.co.uk [193.118.201.102]) by ietf.org (8.9.1a/8.9.1a) with SMTP id CAA16643 for <rohc@ietf.org>; Tue, 12 Mar 2002 02:06:17 -0500 (EST)
Received: by rsys001a.roke.co.uk with Internet Mail Service (5.5.2653.19) id <1XV9B21B>; Tue, 12 Mar 2002 07:04:24 -0000
Received: from roke.co.uk (ras_fennel2.roke.co.uk [193.118.206.44]) by rsys002a.roke.co.uk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id G3566TW5; Tue, 12 Mar 2002 07:04:17 -0000
From: "West, Mark (ITN)" <mark.a.west@roke.co.uk>
To: "Hongbin Liao (Intl Staffing)" <i-hbliao@microsoft.com>
Cc: Julije Ozegovic <julije@fesb.hr>, rohc <rohc@ietf.org>
Message-ID: <3C8D41CE.5080802@roke.co.uk>
Date: Mon, 11 Mar 2002 23:46:22 +0000
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2
X-Accept-Language: en-us
MIME-Version: 1.0
Subject: Re: [rohc] TCP/IP EPIC profile
References: <F4C77846CEE593418BE5AB7B6A83111E0465F4BC@bjs-msg-01.fareast.corp.microsoft.com>
Content-Type: multipart/mixed; boundary="------------InterScan_NT_MIME_Boundary"
Sender: rohc-admin@ietf.org
Errors-To: rohc-admin@ietf.org
X-Mailman-Version: 1.0
Precedence: bulk
List-Id: Robust Header Compression <rohc.ietf.org>
X-BeenThere: rohc@ietf.org

Hi Hongbin,

Responses inline...

Cheers,

Mark.


Hongbin Liao (Intl Staffing) wrote:

> Hi, Mark
> 
>     Thanks for your comments. However, there comes several more issues.
> 
>     1. whether can TCP/IP EPIC profile correctly describe the behavior 
> of TCP? Or, whether EPIC is powerful enough to describe the complicated 
> behavior of TCP/IP?
> 


Ok, there are 2 issues here.  Firstly, TCP is a complex protocol -- no 
argument there!  Does the profile adequately capture it?  I don't know! 
  But there have been arguments to suggest that the protocol is *too* 
complicated!  However, the complexity of the profile is a function of 
the complexity of the protocol.

EPIC can capture arbitrarily complex behaviour.  But (as has been 
discussed) there is a trade-off between profile complexity and 
usefulness.  I could make the profile significantly more complex 
(honestly!)  But would that actually be useful?!

This, I think, is a general issue.  It may be possible to write a highly 
accurate description of TCP, but this complexity would translate 
directly into the implementation of the header compression scheme 
(regardless of how the scheme is expressed).


>     In EPIC, fields are assumed to be independent from each other. Once 
> each field's behaviors are described well, the whole protocol's 
> behaviors are also well-studied. However, in practice, fields in a 
> protocol may not behave independently completely from each other. There 
> may be some connections (or causality) among several fields. For 
> example, most TCP traffics only contain one-way traffic (WWW browsing, 
> FTP downloading, etc.), i.e., only SEQ changes on the forward path (from 
> server to client) and only ACK changes on the backward path (from client 
> to server). The ACK on the forward path and SEQ on the backward path 
> remain constant. However, in TCP/IP EPIC profile, the probabilities for 
> SEQ and ACK are specified seperately:
> 
> seqno-co = LSB(8,63,5%) | LSB(14, 4096, 80%) | LSB(20,16384,10%) | 
> IRREGULAR(32,5%)
> 
> ackno-co = LSB(8,0,5%) | LSB(14,0,80%) | LSB(20,0,10%) | IRREGULAR(32,5%)
> 
> That means, the most frequent packet format is SEQ(LSB-14)/ACK(LSB-14) 
> with probability 64% according to the EPIC profile. However, the most 
> frequent formats are SEQ(LSB-14)/ACK(LSB-8) and SEQ(LSB-8)/ACK(LSB-14) 
> instead. The result is that the compressor uses the shortest Huffman 
> prefix with a not so frequent case and a longer prefix with the most 
> frequent case. The overall performance downgrades.
> 
>     Not only SEQ and ACK have this kind of issue in TCP/IP, TCP options, 
> WINDOW and etc. also have the such an issue. EPIC TCP/IP profile give a 
> wrong, at least not appropriate, probability for each combination of 
> encodings of all fields. To solve it, we have to give the probabilities 
> of each combination of encodings of all fields instead of the 
> probailities of each field individually. However, it seems that EPIC has 
> no such a method to do that. Also, it's impractical to write a profile 
> listing each combination of encoding of all fields.
> 


It is quite fair to say that the fundamental starting point of the EPIC 
approach is that fields are considered independently.  However, it is 
quite wrong to say that it is not possible to account for dependent 
field behaviour.

(I certainly describe the fields as independent for your '30 fields with 
2 choices each' example -- you have given me no information to suggest 
otherwise.  If you want to describe a relationship between a subset of 
the fields, I'm happy to modify my description...)

For example, though, in the complex TCP profile, there are 3 different 
format sets for each of the cases of 'interactive traffic', 'bulk data 
flow' and 'bulk ack flow'.  (I'm quite happy to accept that these do not 
yet correctly reflect the TCP behaviour that we want, but it is clearly 
the case that they capture the existence of a dependency between 
sequence and acknowledgement number...)

You don't (necessarily) have to give probabilities to all combinations 
(which would obviously be impractical), but it is true that you should 
account for the gross dependencies.

(In the example that you discuss above, for example, it is clear that 
the profile description and the probabilities in the text do not match. 
  It is trivial, however, to make the profile match the text).

It is disingenuous to suggest that this is an EPIC specific issue, 
though!  Any set of packet formats that are written out make an implicit 
statement about the relative probabilities of certain occurences.  (This 
is certainly true of RFC 3095...)

Fundamentally, for header compression, it is clear that it is not 
practical to have an exact match for the behaviour of a complex 
protocol.  However (and this is something that we have touched on in 
previous discussions), there is a point beyond which increasingly 
accurate descriptions achieve relatively little in terms of compression 
efficiency.


>     2. MAX_FORMATS
> 
>     To alleviate the memory requirement on compressor and decomressor, 
> EPIC uses MAX_FORMATS to restrict the packet formats generated from 
> profiles. However, if a list of encoding methods doesn't fall in the 
> supported formats, how EPIC encodes it? How to give the accurate 
> information for each encoding method in this kind of lists? Or, the 
> worst encoding method is used for this kind of lists? However, whatever 
> is used, there will be an extra overhead for the encoding methods not 
> falling in the MAX_FORMATS. The issue is how efficiently EPIC handles 
> this situation. It may depends on how to set MAX_FORMATS. The issue is 
> that, is there such a MAX_FORMATS which alleviate the memory 
> requirements and reduce the extra overhead simultaneously? How to 
> determine it? Could you have a rough number on it?
> 


This, again, is an interesting point -- but one that is the subject of a 
trade-off in *any* header compression scheme.  If a 'CO' format cannot 
be found to encode a particular format, then an IR-DYN packet is used. 
This, of course, is identical to RFC 3095.  You cope with the most 
likely compressed packets efficiently and use a dynamic packet for the 
remainder.  This provides a less efficient encoding but (1) increases 
robustness, since the IR-DYN refreshes the dynamic context as a 
side-effect and (2) avoids trying to cope with extraordinarily unlikely 
change combinations.

I don't know how the figures work out, but I hope that with a moderate 
MAX_FORMATS value, there is an extremely high likelihood of matching a 
compressed packet format.  If one in a thousand packets uses IR-DYN 
because of an unlikely change-pattern, does that really affect efficiency?!

Critically, we need to preserve transparency.

And, anyway, RFC-3095, as I say, contains much the same trade-offs.


-- 
Mark A. West, Consultant Engineer
Roke Manor Research Ltd., Romsey, Hants.  SO51 0ZN
Phone +44 (0)1794 833311   Fax  +44 (0)1794 833433

(Yes, I do know that my disclaimer is in an attachment.  And, no, I 
didn't ask for it to be that way)