compression and encryption
Steve Bellovin <smb@research.att.com> Thu, 07 November 1996 13:08 UTC
Received: from cnri by ietf.org id aa20574; 7 Nov 96 8:08 EST
Received: from portal.ex.tis.com by CNRI.Reston.VA.US id aa08985; 7 Nov 96 8:08 EST
Received: (from majordom@localhost) by portal.ex.tis.com (8.8.2/8.8.2) id HAA07741 for ipsec-outgoing; Thu, 7 Nov 1996 07:53:53 -0500 (EST)
Message-Id: <199611070328.WAA00202@smb.research.att.com.research.att.com>
X-Authentication-Warning: smb.research.att.com.research.att.com: smb owned process doing -bs
From: Steve Bellovin <smb@research.att.com>
To: ipsec@tis.com
Subject: compression and encryption
Date: Wed, 06 Nov 1996 20:54:39 -0500
Sender: owner-ipsec@ex.tis.com
Precedence: list
I've been thinking a lot about doing compression over a lossy medium such as IP. The problem is that the receiver can lose compression synchronization. While there may be other ways to handle this, one that has been suggested is the ``resync every n packets'' approach. That is, the sender would include a sequence number in each packet, so the receiver could detect missing packets; every so often, a flag bit would be set or the sequence number would be set to 0, and a new compression state would be initialized. This has the effect of tying together a sequence of packets -- if a packet in the sequence is lost, all subsequent packets until the next resync point are also lost, because they cannot be decompressed. This implies that compression over a lossy medium, using this resync algorithm, acts as a packet loss multiplier -- the loss of one packet will often cause the loss of other packets as well. Call the probability of loss of an individual packet 'p'. (We will assume that loss probability for separate packets is independent. Whether or not this is true may be dependent on the implementation strategies of various routers, and on the hardware error characteristics of the underlying media.) Every 'n' packets, the compression state is reset. We wish to calculate P, the fraction of packets lost out of a burst of n. The first packet is dropped with probability p, and hence arrives with probability 1-p. The second packet arrives successfully if and only (a) it isn't dropped, and (b) the packet preceeding it isn't dropped. In other words, the probability of safe arrival of the second packet in the burst is (1-p)^2. Similarly, we see that for packet 'i', 1<=i<=n, it arrives safely with probability (1-p)^i. We can calculate the average number of safely arriving packets in a burst by adding the average number of copies of each individual packet that arrive -- (i-p)^i -- and dividing by n. Subtracting that from 1 gives the loss probability: P = 1 - sum(i=1 to n) (1-p)^i / n (Btw, I've confirmed this equation via a simulation.) Here are the results of the equation for packet loss probabilities ranging from .01 to .12, and burst sizes ranging from 1 to 16: 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 ---------------------------------------------------------------------- 1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 2 0.01 0.03 0.04 0.06 0.07 0.09 0.10 0.12 0.13 0.15 0.16 0.17 3 0.02 0.04 0.06 0.08 0.10 0.12 0.13 0.15 0.17 0.19 0.20 0.22 4 0.02 0.05 0.07 0.10 0.12 0.14 0.16 0.18 0.21 0.23 0.25 0.27 5 0.03 0.06 0.09 0.11 0.14 0.17 0.19 0.22 0.24 0.26 0.29 0.31 6 0.03 0.07 0.10 0.13 0.16 0.19 0.22 0.25 0.27 0.30 0.32 0.35 7 0.04 0.08 0.11 0.15 0.18 0.21 0.24 0.27 0.30 0.33 0.36 0.38 8 0.04 0.09 0.13 0.16 0.20 0.24 0.27 0.30 0.33 0.36 0.39 0.41 9 0.05 0.09 0.14 0.18 0.22 0.26 0.29 0.33 0.36 0.39 0.42 0.44 10 0.05 0.10 0.15 0.20 0.24 0.28 0.31 0.35 0.38 0.41 0.44 0.47 11 0.06 0.11 0.16 0.21 0.26 0.30 0.34 0.37 0.41 0.44 0.47 0.50 12 0.06 0.12 0.18 0.23 0.27 0.32 0.36 0.39 0.43 0.46 0.49 0.52 13 0.07 0.13 0.19 0.24 0.29 0.33 0.38 0.41 0.45 0.48 0.51 0.54 14 0.07 0.14 0.20 0.25 0.30 0.35 0.39 0.43 0.47 0.50 0.54 0.56 15 0.08 0.15 0.21 0.27 0.32 0.37 0.41 0.45 0.49 0.52 0.55 0.58 16 0.08 0.15 0.22 0.28 0.34 0.38 0.43 0.47 0.51 0.54 0.57 0.60 The first line is the loss probability; the first column is the burst size. We know empirically that a loss probability of .05 is not atypical within the U.S.; the transoceanic trunks can have loss probabilities of .15 or thereabouts. (The packet loss figure reported by 'ping' is a measure of the round trip packet loss, which is too pessimistic. If g is the loss figure reported by ping, the one-way loss can be approximated as 1-sqrt(1-g), since the packet and the response together have more or less independent loss probabilities.) We also know that when packet losses get above 10%, the TCP congestion control algorithms will serve to cut throughput drastically. (N.B. The 10% figure is based on my experience; I haven't analyzed it or even checked the literature...) Our goal, then, must be to keep P below .1, or the resulting TCP reactions will more than compensate for the gains from copmression. We can see from the chart that for p=.05, we must keep n<=4. That is, no more than every fourth packet, the compression state must be reset. For lossy links, the burst size should be 1. One more problem with larger burst sizes should be noted. Even independent of the congestion control algorithms, TCP does only a finite number of retransmissions. Depending on the stack, this number may be surprisingly low. If too many packets in a burst are lost -- or if the resync packet following a group of useless packets is lost -- it is possible that the connection will be aborted. Given all this, it isn't clear how to implement compression under IPSEC. Picking a standard now is clearly premature; we need to investigate various choices. For example, 'n' might be negotiated during the key management exchange. But it isn't clear how a host would know when to pick a large 'n' or when to pick a small 'n'. It might also be desirable to let systems adapt dynamically. For example, a receiver might note the loss rate on received packets, as determined by the sequence numbers, and send periodically send a control packet. The sender could adjust its 'n' accordingly. (Receivers don't need to know 'n'; the change can be made unilaterally by the sender.) Of course, the rate of such messages -- which are reminscent of the link quality messages in PPP -- might need to be negotiable too. But the prospect of any such messages, with the consequent implications for packet formats, is yet another reason to hold off on a compression standard. We should also analyze just how much compression will help. On a typical dial-up link -- the market for this technology -- I suspect that not much text is sent. GIF and JPEG files are already compressed, and are not further compressible. There is a potentially large gain from VJ-like header compression, which we also lose with IPSEC. It might be possible to use our cryptographic state to implement some form of this. If we use per-connection keying, much of the TCP and IP headers can be reconstructed by caching the connection ID information in the security association table, and rebuilding the headers at the far end. I'll leave it to someone else to do the detailed design, but my gut feeling is that this is a bigger win than, say, LZW data compression. A final choice, for some people, might be to let their ISP do the encryption at the POP. (Please, return to your seats. I'm well aware of the implications of this...) Suppose that someone made IPSEC-capable terminal servers, with secure cryptographic hardware. The user's travel key could be transmitted to this box via a cryptographically protected exchange. The traffic is thus protected from the POP to the user's host (or firewall). It might even be desirable to do AH end-to-end (sacrificing VJ header compression but retaining modem or PPP compression), resulting in a final packet format of IP-ESP-AH-data. (I don't remember if this was in Steve Kent's list.) For most users and most data, the threat model is such that POP-to-firewall secrecy is good enough, provided that the ISP can't impersonate them. (I would note that most people trust phone lines far more than they trust the Internet. While I suspect that the Internet isn't quite as dangerous -- nor phone lines quite as safe -- as is generally believed, I do think that on the whole, the perception is correct.) Based on all this, I make the following recommendations: a) we proceed with ESP as it is now. No changes should be made, or hooks left, for compression. b) the key management exchanges should permit the negotiation of an arbitrary set of compression parameters, for when something is defined. c) we investigate IPSEC header compression. d) the IPSEC architecture must allow for the header formats resulting from POP-based encryption. Comments?
- compression and encryption Steve Bellovin
- RE: compression and encryption Bill Hunt
- Re[2]: compression and encryption John H Wilson