Re: [Ieprep] proposed charter

Comments, thoughts, recommendations on the charter.

I've probably been reading too many thesis drafts ... but:

>The IEPREP WG will address proactive measures to congestion and recovery
>from various outages using three perspectives:

The sentence is missing a verb (what comes after 'to').  And this opens
the door to a lot of foggy thinking.  But the term 'recovery' provides a
good way into analyzing the problem.

We should properly be addressing qos and QoS but we need a good fix on
the problems first.  My experience tells me that addressing high
availability engineering issues first -- the things one does before a
disaster strikes -- will throw a lot of the QoS problems into the
no-nevermind locker.  And my experience with chaotic situations like
this tell me that security (especially authenticity) trumps a lot of
other requirements.  

And the qos problem is much more that just servicing some priority
customers.  There are some underlying choices of technology that are
pretty critical.  And these critical issues lie primarily at layer 2; an
area that IETF astutely stays out of.  

Let's see if I can synthesize some things at the bottom.  First some
analysis:

Consider something like Katrina; it's a good example.  The first thing
to understand about any disaster like that is that you always have
chaos.  You have a large influx of people into the disaster footprint
wanting to do good.  And you quite likely have groups of refugees
wanting to get out.  Regardless of the specifics, chaos reigns.  
     The prerequisite to doing anything else is to settle down the
chaos.  And a foundation key to that is to get a serviceable
communications system running.  Serviceable and interoperable are
near-synonyms here; interoperable and internetworkable are even
nearer-synonyms.  So the basic desire in IEPREP is good -- how can we
properly harness internet technology to meet this basic need?

Synthesis, first dab.  We really need three parts of the plumbing (and
the SAFENET group in federal government got the parts right, although I
don't care much for their namecalling):
  - a reach from the disaster footprint out to the undamaged internet
  - a backbone within the disaster footprint (routers at the edge, no
end systems on this segment)
  - a fanout to reach to end systems -- traditionally the domain of LANs
(and most of the cellphone technologies).  
     For reference, SAFENET calls these the Enterprise, Jurisdiction and
Incident area networks respectively.  

You also need some critical overhead parts:
  - it's all gotta be routable network or we can't hold the rest of the
discussion. 
  - you need some network operations support (i.e. a NOC).  (outside the
disaster footprint)
  - you need some training, 1) particularly in the reachback and
backbone equipment deployment and configuration and 2) the remote
management.  
  - you need a sackful of gateways to meet the sackful of technologies
that folks will show up with in the fanout network category.  

I'd contend that without this much context, you can't have much
productive discussion about qos and QoS.  

But now we've got a sketch.  Where does the congestion occur and what
tools do we need to handle it?

We don't have congestion in the intact portion of the internet.  And we
don't have it at the fanout portion (at least not in the sense that we
can't split/bridge and play the well-worn overprovisioning tricks).  
     Where we do have congestion is in the reachback and backbone
networks.  In most cases, we're confined to radio-WAN  technologies of
some sort (unless we string emergency fiber ... do it if you can).
Which means that in these segments you'll have about four orders of
magnitude less capacity an in the rest of the 'net.  
     This manifests itself in two problems:

1.  These network segments must be stable under overload because we can
expect overload to be a ubiquitous condition.  This should drive choices
of technology: contention access networks will stall under overload;
non-contention access (scheduling) is stable.  (Scheduling access
methods also afford bandwidth efficiency at exactly the place we need it
and allow for QoS controls to be implemented -- you can do neither of
these with contention access).

2.  You need some sort of queue control at the routers.  This is the
part of the problem that has been well-worn in IEPREP in the past.
Admittance control and prioritization are needed, but the usual warning
that oversteering is likely to make the cure worse than the disease.
     Similar discussions pertain to applications (like the SIP ones
we've had here).  But I'm not going there today....

But scope -- looking only at congestion and QoS issues -- is far to
limiting.

IEPREP needs to address some things that have hitherto been considered
outside scope if it wishes to produce anything relevant.  High
availability engineering is at the top of this list.
     The more robust the internetwork is prior to a disaster, the less
there is to patch and restore when the disaster strikes (the World Trade
Center anecdotes and the minimal impacts on the Internet are here
relevant).  The textbook on high availability engineering lists three
principles:
  - elimination of single points of failure
  - reliable crossover
  - prompt notification of failures
Of these, the second is properly addressed by the basic design of IP.
This should drive an underlying requirement of routable networks ... and
hook them together with routers, not bridges.  The third principle is
the reasoning behind the NOC and training recommendations above.  
     The first principle is where the unknowns lie -- you don't know
where a disaster will strike so you will have some damaged footprint ...
otherwise we wouldn't be doing this.  Pre-disaster, do the high Ao
homework.  Post-disaster, have the tools to patch/fill the disaster
footprint quickly.  And the most important parts of this patch/fill
operation in terms of infrastructure are the backbone and reachback
segments of the net -- provide a bunch of POPs (aka routers) at the
border of the backbone network and the problem at the federal level is
in hand.  
     And we've enabled the local jurisdictions to solve their problems
by finding something to plug into the POPs.  If you're not familiar with
that waterfront, there are a wide variety of wannabe solutions, only a
few of which are natively routable networks.  The rest require some kind
of gateway (e.g. P25 land mobile radio).  The QoS problem shows up here,
but you can't address it before you address the gateway issue itself.
Some fireman with a land mobile radio needs to talk to somebody who's
several segments away and only one of those segments is LMR; the rest
are routable networks.  I'm expecting Darwin to clean up a lot of this
mess in the next few years, but it hasn't happened yet.  

I'll try and synthesize in a bit, but a couple of other kvetchings
first:
  - "voice was the driving application for IEPREP in the past, ...all
applications essential to emergency communications"  This comment (focus
on the word 'all' is correct.  In spades.  When a colleague of mine put
down a laydown in Thailand (post-tsunami), the fanout was 802.11 WiFi.
When the guys got things working, they just turned it on -- no
advertising.  Within a couple of hours, they had ~50 users.  This was
right in the middle of the refugee camp area and morgue -- a great deal
of the traffic was 'I'm ok' e-mail.  While we've no statistics, we don't
think any of that was VOIP.  
  - "considerations for treatment and security of emergency"  Right
phrase; wrong target, IMHO.  While stories of scavengers reeling up wire
to salvage the copper are legion, the concerns we've seen have been for
safety of the people involved.  So the first thing you want to do is
minimize the number of people within-footprint (why the NOC is outside).
On Gulf Coast, one genre of victim that tended not to evacuate were the
drug addicts.  After a few days without a fix -- their support system
went away too -- these folks floating around among your emergency
services personnel are not necessarily a Good Thing.  
  - "IEPREP will pursue subject matter experts (e.g., security)"  In an
emergency situation (remember, chaos?) there is lots of misinformation.
Just one example was the report of the levees in New Orleans giving way
-- the authentic message was lost among the swirl of misinformation.
Authentication is the single most important security feature needed --
and it's authentication of the content that is where the focus needs to
be (digital signature of e-mail body parts is a good example).  This
won't solve all of the chaos problems, but it provides a tool that is
targeted at the right place.  This requirement falls above QoS control
in my book.  Emergency services operations also have occasional needs
for confidentiality (medical data is an example).  These security
measures belong in layer 7 applications; otherwise the network plumbing
is not usable by the folks who need it most.

Summary of analysis.  My gripe about the existing IEPREP charter (and
the proposed recharter, although it's better) is that we're not holistic
enough.  1) Focusing on the micro issue of priority doesn't address
enough of the QoS problem and 2) focusing on QoS doesn't get arms 'round
the high Ao and security issues that are at least as important.  
      (How'd we get into this?  My take is that 'IETF does protocols'
and only layer-3-and-higher to boot.  That clamps on the blinders pretty
tightly because the one high Ao problem that's within that scope is the
one we've already solved).

Synthesis.  With all of that, what _should_ the charter be saying?
IEPREP has always been a non-traditional WG within the ethic of IETF.
So let's accept that and get arms around the whole problem.

I'm here:

> The IEPREP WG will address

- the high availability engineering principles as applied to internet
infrastructure, both pre- and post-disaster.  Answer the question for
both a federal and a local level jurisdiction of 'what should I do to
prepare?'

- recommend protocols and procedures to provide for authenticating and,
as required, confidentiality-protecting traffic within, into and out of
a disaster area.

- recommend protocols, technology choices and provisioning practices to
address the dearth of communications capacity and the large amount of
high-priority traffic that always attend a disaster.  

and here:

> IEPREP WG will address proactive measures to congestion and recovery
from various outages using three perspectives:

This part is pretty close except the primary focus should be on
'recovery', both pre- and post-.  But in 2. there's an unrecognized
divide here between what federal level government should be doing and
what local jurisdictions should be planning.  Ask the question 'what
should I do?' from the position of county OES coordinator, and from the
position of a federal agency.  (In general, the feds should be, IMHO,
looking at the backbone, reachback and NOC issues while the locals
should be worrying about fanout, possibly the backbone, and the
install/configure training issues.)  Remember how many local
jurisdictions we're dealing with: it's dozens at the federal level
alone,... and thousands at the local level -- and these cats are looking
for a cat-herder with a can-opener and some catfood.  

Leveraging commercial (#3) is on target; leave as is.  So is #1.  

Help?

-- 
Rex Buddenberg
Naval Postgraduate School
Code IS/Bu
Monterey, Ca 93943

831/656-3576

_______________________________________________
Ieprep mailing list
Ieprep@ietf.org
https://www1.ietf.org/mailman/listinfo/ieprep