Concerns about exported numbers

Frederick Baker <fred@cisco.com> Wed, 20 December 1995 21:40 UTC

Errors-To: atmpost@matmos.hpl.hp.com
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Frederick Baker <fred@cisco.com>
Date: Wed, 20 Dec 1995 12:14:16 -0800
Message-Id: <199512202014.MAA00484@fred-ss20.cisco.com>
To: int-serv@isi.edu
Subject: Concerns about exported numbers
Cc: ip-atm@hplb.hpl.hp.com

I have had the following discussion privately with Scott, and Yakov
asked me to make the comment publicly. I suspect that the comment or
comments like it fed into the working group decision at the recent IETF
to consider exported measurements in the ADSPEC optional. It is my
understanding that this concern lies at least in part at the root of
the relationship between IS Routers and the IP/ATM work. My comment
applies to:

	draft-ietf-intserv-control-del-svc-02.txt
	draft-ietf-intserv-predictive-svc-01.txt
	draft-ietf-intserv-guaranteed-svc-02.txt

The Predictive and Controlled Delay services require of each router en
route from a sender to a receiver that it export in the ADSPEC the
following information:

   Exported Information

   Each controlled delay service module exports at least the following
   information. All of the parameters described below are
   characterization parameters.

   For each level of service, the network element exports three
   measurements of delay (thus making nine quantities in total).  Each
   of these characterization parameters is based on the maximal packet
   transit delay experienced over some set of previous time intervals of
   length T; these delays do not include discarded packets.  The three
   time intervals T are 1 second, 60 seconds, and 3600 seconds.  The
   exported parameters are averages over some set of these previous time
   intervals.

	...

   The delays are measured in units of one microsecond.  An individual
   element can advertise a delay value between 1 and 2**28 (somewhat
   over two minutes) and the total delay added across all elements can
   range as high as 2**32-1.  Should the sum of the different elements
   delay exceed 2**32-1, the end-to-end advertised delay should be
   2**32-1.

	...

   The controlled delay service is service_name 1.

   The delay characterization parameters receive parameter_number's one
   through nine, in the order given above. That is,

      parameter_name          definition

      1                       Service Level = 1, T = 1
      2                       Service Level = 1, T = 60
      3                       Service Level = 1, T = 3600
      4                       Service Level = 2, T = 1
      5                       Service Level = 2, T = 60
      6                       Service Level = 2, T = 3600
      7                       Service Level = 3, T = 1
      8                       Service Level = 3, T = 60
      9                       Service Level = 3, T = 3600

   The end-to-end composed results are assigned parameter_names N+10,
   where N is the value of the per-hop name given above.

In a nutshell, each PATH message is expected to contain 18 numbers,
which tell us the previous hop contribution to mean path queuing delay
and the accumulated queuing delay en route from the sender to the
receiver, in microseconds. Now, as I figure this, it seems to me that
what's required to do this is the following:

  - every message placed in the service class queue is timestamped on
    enqueue and dequeue, and the maximum store-and-forward delay within
    a second is accumulated in associated storage.

  - every second, a process captures this largest-in-this-second
    value in a 3600 element array. This value will be reported as
    object 1, 4, or 7, and accumulated values will be reported as
    objects 11, 14, and 17. It will also maintain averages for 60 and
    3600 seconds, using the following algorithm:

    hour_counter = (1 + hour_counter)%3600;
    minute_counter = (hour_counter + 3600 - 60)%60;
    for (level = 0; level < 3; level++) {
	if->delay3600[level] -= if->delays[level][hour_counter]/3600;
	if->delay60[level] -= if->delays[level][minute_counter]/60;
	if->delays[level][hour_counter] = if->accumulated_max_delay[level];
	if->delay3600[level] += if->accumulated_max_delay[level]/3600;
	if->delay60[level] += if->accumulated_max_delay[level]/60;
	if->accumulated_max_delay = 0;
    }

    It will report these and their accumulations.

I have several concerns here. Quickest to describe are the following cost
factors:

  - Few routers in my aquaintance have clocks that can be read with
    microsecond granularity. Millisecond granularity is acheivable, and
    ten millisecond granularity (the assumption used in SNMP) is
    common. Yes, the specification makes clear that guesstimates are
    acceptable; but if guesstimates are acceptable, who are we kidding
    when we report them with microsecond precision?

  - Those of us with hardware switching engines are going to have to
    change our switching engines to get these numbers. This has not
    here-to-fore been a requirement.

  - Four bytes times 3600 samples times three service levels is 43.2
    Kbytes per interface. On a device with an arbitrary number of
    interfaces (we have devices with as many as 120 interfaces on a
    single card, never mind the system the card is in) this is a
    significant amount of memory.

More subtly, it is not at all clear to me what the receiving host is
supposed to do with the information. What has the number from the
preceeding hop got to do with the price of eggs?  If I squared the
reported values, what would the host do differently as a result that it
would not do based on the accumulated delays?

Limiting consideration to the accumulated delays, if I see a certain
value for the preceeding second, is that any indication of what the
next second is likely to do?  How do I know that a new flow won't fire
up within the next five seconds that completely obsoletes the number?
If I am receiving these every 30 seconds, what were the values for the
29 seconds in between this and the last update? What was the standard
deviation of those samples? Is service stable or changing? Similarly
the hour value - what does it mean that the maximum delay from this
sender at service level 1 has averaged 28292 microseconds over the last
hour? If a new flow started 600 seconds ago and traffic is now
experiencing twice that, does the number tell me anything I can
reasonably plan on?

If I am going to report anything at all of this type, and I still
wonder about silicon routers, I think a far better choice would be to
maintain an exponentially weighted moving average of the per-second
maxima, and use it to report report four accumulations:

	the mean maximum delay traffic is experiencing on this path
		(which, due to merging, is probably at different service
		levels in different parts of the path)
	the delay it would experience if it were level 1 end to end
	the delay it would experience if it were level 2 end to end
	the delay it would experience if it were level 3 end to end

and I could see providing the mean variation between maxima along the
same lines.

The host's capabilities are now clear - if a lower service level will
give it adequate delay, it can reduce its service requirements, and if
a lower delay number is needed, it should be obvious if the number is
not acheivable, and if it is acheivable, the necessary service level
request should be apparent.

-- deep breath --

The Guaranteed Service requires that each router advertise that each
router advertise two numbers.

   Exported Information

   Each guaranteed service module must export at least the following
   information.  All of the parameters described below are
   characterization parameters.

   A network elements implementation of guaranteed service is
   characterized by two error terms, C and D, which represent how the
   element's implementation of the guaranteed service deviates from the
   fluid model.  These two parameters have an additive composition rule.

   If the composition function is applied along the entire path to
   compute the end-to-end sums of C and D (Ctot and Dtot) and the
   resulting values are then provided to the end nodes (by presumably
   the setup protocol), the end nodes can compute the maximal packet
   delays.  Moreover, if the partial sums (Csum and Dsum) from the most
   recent reshaping point (reshaping points are defined below)
   downstream towards receivers are handed to each network element then
   these network elements can compute the buffer allocations necessary
   to achieve no packet loss.  The proper use and provision of this
   service requires that the quantities Ctot and Dtot, and the
   quantities Csum and Dsum be computed.  Therefore, we assume that
   usage of guaranteed service will be primarily in contexts where these
   quantities are made available to end nodes and network elements.

   The error term C is measured in units of bytes.  An individual
   element can advertise a C value between 1 and 2**28 (a bit over 250
   megabytes) and the total added over all elements can range as high as
   (2**32)-1.  Should the sum of the different elements delay exceed
   (2**32)-1, the end-to-end error term should be (2**32)-1.

   The error term D is measured in units of one microsecond.  An
   individual element can advertise a delay value between 1 and 2**28
   (somewhat over two minutes) and the total delay added all elements
   can range as high as (2**32)-1.  Should the sum of the different
   elements delay exceed (2**32)-1, the end-to-end delay should be
   (2**32)-1.

   The guaranteed service is service_name 2.

   Error characterization parameter C is numbered 1 and parameter D is
   numbered 2.

   The end-to-end composed value (Ctot) for C is numbered 3 and the
   end-to-end composed value for D (Dtot) is numbered 4.

   The since-last-reshaping point composed value (Csum) for C is
   numbered 5 and the since-last-reshaping point composed value for D
   (Dsum) is numbered 6.


   No other exported data is required by this specification.

No doubt, if I go read the references

   [3] L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for
   Packet Switching Networks," in Proc. ACM SIGCOMM '90, pp. 19-29.

   [4] D. Verma, H. Zhang, and D. Ferrari, "Guaranteeing Delay Jitter
   Bounds in Packet Switching Networks," in Proc. Tricomm '91.

   [5] L. Georgiadis, R. Guerin, V. Peris, and K. N. Sivaraja,
   "Efficient Network QoS Provisioning Based on per Node Traffic
   Shaping," IBM Research Report No. RC-20064.

   [6] P. Goyal, S.S. Lam and H.M. Vin, "Determining End-to-End Delay
   Bounds in Heterogeneous Networks," in Proc. 5th Intl. Workshop on
   Network and Operating System Support for Digital Audio and Video,
   April 1995.

one of them will inform me of the formal meaning of the variables C and
D; however, benighted person that I am, I don't have ready access to
IBM Research Reports or the proceedings of conferences I don't attend,
and ACM SIGCOMM periodically stops sending me things, one of the issues
they've missed mailing me being SIGCOMM '90 (I have had continuous
registration for a long time, but periodically have to write to them
and ask them to send recent issues, as they don't mail them to me. I
had to ask recently for the entire year of 1994!)

So it would be really really nice if some kind soul would put a
paragraph into the specification that says "C means this" and "D means
that". My english teacher in fifth grade taught me to do that...

After you tell me what C and D are, I'll comment on whether they're
implementable. It's not at all obvious that they are based on what I
know.

And I still don't really know what to do with the information. Yes, if
I have a system that allocates memory to interfaces at flow negotiation
time rather than on message receipt, I can plug the numbers into the
formula.  But would the folks who do that mind raising their hands? At
the IETF meeting, when we were discussing other values that might be
useful for doing so, no-one (including those asking for the numbers)
said that they were definitely going to do that. And if no-one is
plugging the numbers into *that* formula, what has the number got to do
with the price of eggs?

-- btw, you might want to think about this one... --

All of the above comments are in relation to the standard Integrated
Services model, which assumes that all interconnects are done via point to
point lines and the system making the guarantees is in fact in control of
the statistical behavior of the queues and media between itself and its
next hop. There is some additional concern being raised about places where
there might be a non-RSVP cloud between IS hops, but the assumption is
being made that these are detectable using IP TTL - that those hops are
routers.

What the above reports display are queuing delays. What about propagation
delays? Propogation delays include such media characteristics as:

  - shared media such as broadcast LANs
  - satellite hops (272 milliseconds delay per hop)
  - ISDN circuit switch drop-and-add multiplexing points
  - telco repeater delays
  - layer 2 switching fabrics such as ATM or Frame Relay, which have
    internal queuing, forward error correction, and speed of light delays

If the accumulated delays are supposed to externalize the actual mean end
to end delay, all of the above are non-trivial components and are neither
measured nor reported, and it's not clear how they might be measured or
reported. I suppose we could dedicate 43 K of RAM per VCI in an ATM switch
to keep track of maximal end to end delays...

-- the End to End Principle --

Someone told me once about a principle that applies to services in general.
It goes something like this:

	"If something must be done at layer N for some applications but
	not for others, posit the mechanism in that layer rather than
	some lower layer."

Now, clearly, not all applications or transports need a measurement of end
to end delay. TCP, for example, plugs along quite happily without it. So
the lowest layer at which one might expect to find a non-universal service
that measures end to end delay is at the transport layer. I observe that
RTP, frequently used by multimedia protocols with real time requirements,
contains a time stamp that is expressly designed to use the network time
protocol as a source of timing information. Father Time, in my hearing, has
boasted that the quality of timing information that results from NTP is
such that he can calibrate the thermostats in machine rooms around the
world by the drift in their machine's reference clocks.

Given that I have a way to determine what the end to end delay *is*, it
seems that determining what it would be were I to ask for something
better is determinable by experimentation - if you don't like the
service you're getting, ask for a higher level and see if it is better;
if you are gettingsignificantly better service than you need, try
asking for a lower level and see what the effect is. Less
deterministic, yes, but implementable with no support from the network
at all, and (unlike this model) reliably delivers as good a set of
numbers as you care to calculate.


-- My Recommendation --

There is no non-RSVP Integrated Services implementation (ST-II has a
different flow specification and set of things it does with it), and the
RSVP implementations I am aware of (ISI and derivatives, Bay, Cisco) do not
implement these exports at all. They are significantly problematic, and at
the IETF meeting a few weeks ago no-one proposed that they be mandatory to
implement.

Can we please rely on the end to end services to make their own
measurements, and forget these?

Concerns about exported numbers Frederick Baker
Concerns about exported numbers Roch Guerin ((914) 784-7038)
Re: Concerns about exported numbers Fred Baker
Re: Concerns about exported numbers John Wroclawski
Re: Concerns about exported numbers Hari Adiseshu