RE: [rohc] TCP/IP EPIC profile

"Hongbin Liao (Intl Staffing)" <i-hbliao@microsoft.com> Tue, 12 March 2002 13:46 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA21589 for <rohc-archive@odin.ietf.org>; Tue, 12 Mar 2002 08:46:16 -0500 (EST)
Received: from optimus.ietf.org (localhost [127.0.0.1]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id IAA12411; Tue, 12 Mar 2002 08:41:48 -0500 (EST)
Received: from ietf.org (odin [132.151.1.176]) by optimus.ietf.org (8.9.1a/8.9.1) with ESMTP id IAA12382 for <rohc@optimus.ietf.org>; Tue, 12 Mar 2002 08:41:47 -0500 (EST)
Received: from mail-jpn.microsoft.com (mail-jpn.microsoft.com [207.46.71.29]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA21528 for <rohc@ietf.org>; Tue, 12 Mar 2002 08:41:44 -0500 (EST)
Received: from jpn-imc-01.fareast.corp.microsoft.com ([157.60.4.29]) by mail-jpn.microsoft.com with Microsoft SMTPSVC(5.0.2195.2966); Tue, 12 Mar 2002 22:41:15 +0900
Received: from 157.60.4.29 by jpn-imc-01.fareast.corp.microsoft.com (InterScan E-Mail VirusWall NT); Tue, 12 Mar 2002 22:41:15 +0900
Received: from bjs-msg-01.fareast.corp.microsoft.com ([157.60.72.120]) by jpn-imc-01.fareast.corp.microsoft.com with Microsoft SMTPSVC(5.0.2195.2966); Tue, 12 Mar 2002 22:41:15 +0900
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Subject: RE: [rohc] TCP/IP EPIC profile
Date: Tue, 12 Mar 2002 21:40:45 +0800
Message-ID: <F4C77846CEE593418BE5AB7B6A83111E046521AE@bjs-msg-01.fareast.corp.microsoft.com>
Thread-Topic: [rohc] TCP/IP EPIC profile
Thread-Index: AcHJlDuTk+fvdo+8ReeCr0mKuRHpYAABnkKw
From: "Hongbin Liao (Intl Staffing)" <i-hbliao@microsoft.com>
To: "West, Mark (ITN)" <mark.a.west@roke.co.uk>
Cc: "Julije Ozegovic" <julije@fesb.hr>, "rohc" <rohc@ietf.org>
X-OriginalArrivalTime: 12 Mar 2002 13:41:15.0333 (UTC) FILETIME=[94B21F50:01C1C9CB]
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by optimus.ietf.org id IAA12383
Sender: rohc-admin@ietf.org
Errors-To: rohc-admin@ietf.org
X-Mailman-Version: 1.0
Precedence: bulk
List-Id: Robust Header Compression <rohc.ietf.org>
X-BeenThere: rohc@ietf.org
Content-Transfer-Encoding: 8bit

Hi, Mark

	See my comments inline.

Thanks

Hongbin L.
03/12/2002


> -----Original Message-----
> From: West, Mark (ITN) [mailto:mark.a.west@roke.co.uk]
> Sent: Tuesday, March 12, 2002 7:46 AM
> To: Hongbin Liao (Intl Staffing)
> Cc: Julije Ozegovic; rohc
> Subject: Re: [rohc] TCP/IP EPIC profile
> 
> 
> 
> Hi Hongbin,
> 
> Responses inline...
> 
> Cheers,
> 
> Mark.
> 
> 
> Hongbin Liao (Intl Staffing) wrote:
> 
> > Hi, Mark
> > 
> >     Thanks for your comments. However, there comes several more
> > issues.
> > 
> >     1. whether can TCP/IP EPIC profile correctly describe
> the behavior
> > of TCP? Or, whether EPIC is powerful enough to describe the
> complicated
> > behavior of TCP/IP?
> > 
> 
> 
> Ok, there are 2 issues here.  Firstly, TCP is a complex
> protocol -- no 
> argument there!  Does the profile adequately capture it?  I 
> don't know! 
>   But there have been arguments to suggest that the protocol is *too* 
> complicated!  However, the complexity of the profile is a function of 
> the complexity of the protocol.
> 

Yeah, as we agree that TCP is a protocol with complex behavior. In average, the complex behavior of a protocol tends to need a complex profile to accurately (or exactly) describe it. However, considering that we need to target at the tradeoff between complexity and performance, we just need correctly (not necessarily accurately or exactly) capture the most important behavior of the protocol, which may not necessarily correspond to a complex profile.


> EPIC can capture arbitrarily complex behaviour.  But (as has been 
> discussed) there is a trade-off between profile complexity and 
> usefulness.  I could make the profile significantly more complex 
> (honestly!)  But would that actually be useful?!
> 

As I mentioned above, what is most important is that We need a profile to describe the correct (not accurate or exact) behavior. It doesn't mean to capture the correctly behavior we have to use a complex profile. A simple profile may be good enough to capture all frequent cases. For the point of average performance, it should be enough if a simple profile can capture correctly most frequent protocol behaviors.

Using the shortest packet for the most frequent case, it should be the main guideline of the design of header compression packet format. In fact, EPIC-LITE is also designed on it. We noticed that many protocols have some correlations between different fields (if field A can be encoded using XXX, then field B can only be encoded using YYY, or etc). For example, in TCP, some tcp options, such as MSS, SACK-Permited, Window-Scale, can only appear in SYN packet, SACK options can only appear if ACK flag is set; in SCTP, there is weak correlation between TSN & Seq; in RTP, Sequence Number & Timestamp have some weak correlation too. There are not only correlation between the encode methods of two different fields, there may also correlation between the encode method of a field and the value of another field. At least, we need mechanism to describe such dependency correctly. Otherwise, we may not correctly describe the behavior of some fields. Thus, it may not be reasonable to make the!
 assumption that all the packet fields are independent.


> This, I think, is a general issue.  It may be possible to 
> write a highly 
> accurate description of TCP, but this complexity would translate 
> directly into the implementation of the header compression scheme 
> (regardless of how the scheme is expressed).
> 

Again, for the point of performance, we need not describe a highly accurate description of TCP. However, we need describe the most frequent-occured cases correctly and simply.

> 
> >     In EPIC, fields are assumed to be independent from each 
> other. Once 
> > each field's behaviors are described well, the whole protocol's 
> > behaviors are also well-studied. However, in practice, fields in a 
> > protocol may not behave independently completely from each 
> other. There 
> > may be some connections (or causality) among several fields. For 
> > example, most TCP traffics only contain one-way traffic 
> (WWW browsing, 
> > FTP downloading, etc.), i.e., only SEQ changes on the 
> forward path (from 
> > server to client) and only ACK changes on the backward path 
> (from client 
> > to server). The ACK on the forward path and SEQ on the 
> backward path 
> > remain constant. However, in TCP/IP EPIC profile, the 
> probabilities for 
> > SEQ and ACK are specified seperately:
> > 
> > seqno-co = LSB(8,63,5%) | LSB(14, 4096, 80%) | LSB(20,16384,10%) | 
> > IRREGULAR(32,5%)
> > 
> > ackno-co = LSB(8,0,5%) | LSB(14,0,80%) | LSB(20,0,10%) | 
> IRREGULAR(32,5%)
> > 
> > That means, the most frequent packet format is 
> SEQ(LSB-14)/ACK(LSB-14) 
> > with probability 64% according to the EPIC profile. 
> However, the most 
> > frequent formats are SEQ(LSB-14)/ACK(LSB-8) and 
> SEQ(LSB-8)/ACK(LSB-14) 
> > instead. The result is that the compressor uses the 
> shortest Huffman 
> > prefix with a not so frequent case and a longer prefix with 
> the most 
> > frequent case. The overall performance downgrades.
> > 
> >     Not only SEQ and ACK have this kind of issue in TCP/IP, 
> TCP options, 
> > WINDOW and etc. also have the such an issue. EPIC TCP/IP 
> profile give a 
> > wrong, at least not appropriate, probability for each 
> combination of 
> > encodings of all fields. To solve it, we have to give the 
> probabilities 
> > of each combination of encodings of all fields instead of the 
> > probailities of each field individually. However, it seems 
> that EPIC has 
> > no such a method to do that. Also, it's impractical to 
> write a profile 
> > listing each combination of encoding of all fields.
> > 
> 
> 
> It is quite fair to say that the fundamental starting point 
> of the EPIC 
> approach is that fields are considered independently.  However, it is 
> quite wrong to say that it is not possible to account for dependent 
> field behaviour.
> 

Well, how to describe the dependency between different fields then? Will you increase the complexity of the profile for that? At least, from EPIC-LITE draft and the examples in EPIC-LITE and EPIC TCP/IP profile, I can not find such a method.

> (I certainly describe the fields as independent for your '30 
> fields with 
> 2 choices each' example -- you have given me no information 
> to suggest 
> otherwise.  If you want to describe a relationship between a 
> subset of 
> the fields, I'm happy to modify my description...)
> 

En, the 30 fields example is used just for discussing another issue which is irrelevant to the issue here.

> For example, though, in the complex TCP profile, there are 3 
> different 
> format sets for each of the cases of 'interactive traffic', 
> 'bulk data 
> flow' and 'bulk ack flow'.  (I'm quite happy to accept that 
> these do not 
> yet correctly reflect the TCP behaviour that we want, but it 
> is clearly 
> the case that they capture the existence of a dependency between 
> sequence and acknowledgement number...)
> 

How can u identify a flow is 'interactive traffic', 'bulk data flow' and 'bulk ack flow' then? Even you can identify it, you still need extra overhead to tell decompressors which case the packet is. Meanwhile, if we consider other possible dependence for other fields, should we generate more format sets? That may make people confusing.

> You don't (necessarily) have to give probabilities to all 
> combinations 
> (which would obviously be impractical), but it is true that 
> you should 
> account for the gross dependencies.
> 
> (In the example that you discuss above, for example, it is clear that 
> the profile description and the probabilities in the text do 
> not match. 
>   It is trivial, however, to make the profile match the text).
> 

Yeah, however, the profile should not produce the wrong predication of the most frequent-occured case of TCP/IP packet format.

> It is disingenuous to suggest that this is an EPIC specific issue, 
> though!  Any set of packet formats that are written out make 
> an implicit 
> statement about the relative probabilities of certain 
> occurences.  (This 
> is certainly true of RFC 3095...)
> 

Well, this is debatable. RFC 3095 can support the format which use a tag to implicitly indicate whether a field may exist or not (if field A is encoded XXX, then field B is encoded YYY). However, if EPIC assumes that packet fields are independent, it's hard to express these correlations correctly.


> Fundamentally, for header compression, it is clear that it is not 
> practical to have an exact match for the behaviour of a complex 
> protocol.  However (and this is something that we have touched on in 
> previous discussions), there is a point beyond which increasingly 
> accurate descriptions achieve relatively little in terms of 
> compression 
> efficiency.

For header compression efficiency, the main issue for packet format is whether the most frequent cases are identified correctly and encoded using the shortest packet. As to how accurate (or exact) the protocol behavior is studied, at least, it should correctly identify the most frequent-occurred protocol behaviors.


_______________________________________________
Rohc mailing list
Rohc@ietf.org
https://www1.ietf.org/mailman/listinfo/rohc