Re: [6tsch] priority and priority

Tom Phinney <tom.phinney@cox.net> Sun, 10 March 2013 06:38 UTC

Message-ID: <513C2A79.3020807@cox.net>
Date: Sat, 09 Mar 2013 23:38:49 -0700
From: Tom Phinney <tom.phinney@cox.net>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101013 Thunderbird/3.1.5
MIME-Version: 1.0
To: "6tsch@ietf.org" <6tsch@ietf.org>
References: <F5C7FB9548FA6A4B8538AFEF6199B0ED151D0155@xmb-aln-x10.cisco.com>
In-Reply-To: <F5C7FB9548FA6A4B8538AFEF6199B0ED151D0155@xmb-aln-x10.cisco.com>
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Subject: Re: [6tsch] priority and priority
Precedence: list
Reply-To: Tom Phinney <tom.phinney@cox.net>

Shitanshu et al,

I sense that I may be viewing related applications through very different lenses than the rest of you. The disconnect is undoubtedly mine, since I am not grounded in IETF terminology the way that you all are.

https://tools.ietf.org/id/draft-phinney-roll-rpl-industrial-applicability-02.txt" rel="nofollow">https://tools.ietf.org/id/draft-phinney-roll-rpl-industrial-applicability-02.txt provides much of the background for the following.

Automation of continuous processes often involves large physical plants with km of piping, hundreds to thousands of valves and thousands to tens of thousands of sensors. Much of the high-value and critical equipment typically is concentrated in areas of about 1 hectare (i.e., 100 m on a side), typically extending 10 m or more in the air. Other parts are spread over a large area, with most equipment within 2-3 m of the ground. The largest plant I ever visited was 10k ha (100 square miles); the area in which outdoor communication is required in these plants can be enormous.

At the other extreme, a pharmaceutical plant may contain many dozen chemical reactors, each with associated piping and valving, each organized as a column of equipment on a skid that can be moved and replaced as necessary. Such a structure is suitable for managing chemical or biological reactions that only work at small scale, as well as for providing fine-grained traceability of final product, which latter is required by the health agencies of many governments.

From an automation perspective these plants are more similar than dissimilar: each has a set of closed-loop control and monitoring functions that have specific timeliness requirements, where the individual flows that represent purpose-specific communications relationships can be ranked in terms of both criticality and their acceptable distribution of delivery delay.

In general, criticality varies inversely with
1) the communication class (see 2.1.1 in the above document), and
2) the reporting period. (Note that loop rate is the inverse of reporting period.)

A third contributing factor is the role of the specific loop in the plant; within a class at a given loop rate those components whose failure has a greater consequence tend to be higher on the scale.

This 3-component ranking does not lead immediately to a linear priority structure, because control communications is essentially deadline scheduled rather than prioritized. As noted in the earlier e-mail, delivery early or late in a given reporting period makes no difference; it is only missed delivery deadlines that are consequential. Even then, the control strategies that have evolved during the last 50 years tolerate a few successive losses (generally <= 3) after which they mode-shift to a backup regime that emphasizes plant safety over product.

An example may help. Modern toothpaste is mixed in big batch reactors. It includes various corrosive and toxic components, including fluoride. A breakdown in control of the mixing process, or of history collection due to lost reports, may lead to tonnes of toxic waste (because governments require uninterrupted history collection to monitor the contents and safety of such "medical" products). Although the industry has developed mechanisms to span and recover from brief to moderate communication outages in history collection, a failure of the process historian's database will trigger such a toxic waste disposal problem for the plant manager.

Getting back to the prior discussion, publish-subscribe communications may be assigned to flows. When communications redundancy is employed, as it often is for the highest-ranked monitoring and control loops, the resulting (usually two) assigned flows must be disjoint throughout the network, never converging to any point that would provide a common failure mode until they reach their final destination, which today is usually a fault-tolerant server farm.

Source-sink communications, which is used for stateless device events and stateful process alarms -- the latter go into alarm, then later return to normal -- has much reduced timeliness constraints, particularly when there is lots of other alarm traffic. There is usually a contractual commitment that the first process alarm that occurs in a period of relatively quiescent plant operation will be reported within 5 s, often to 4 sigma to 6 sigma confidence. On the other hand, the 70th alarm in a ten minute period has no reporting timeliness requirements; it simply should not be lost. As with historizing and control loops, the industry has developed mechanisms to retrieve process alarm status even when the reporting messages are lost, so per-message loss is not a major issue. Application-layer-triggered retries push those initial alarms persistently; but once the system is flooded with alarms the delivery requirements change. It is for that reason that WirelessHART mandated that each 802.15.4 router have only a single alarm forwarding buffer, causing alarm backlogs to build up at the originators (where intelligent alarm reordering and aggregation can occur) rather than in intermediary router queues.

Client-server communications is used primarily to respond to human or programmatic requests for detailed status of remote devices. Periodic publish-subscribe communications typically sends only a single float32 and a coded status byte; when that status byte indicates problems in the device or with the process, a secondary retrieval of more detailed information from the device is usually required. That communication typically requires timeliness suitable to sustain human interaction; if it takes too long the operator will attempt to accelerate the process by demanding even more data.

Because the bandwidth available for such state retrieval is typically quite low, the centralized side of the automation system typically caches the data from each such report, thereby being able to present rapid and relatively reliable information to centralized automation programs that work with such data. For that same reason, much of the data that is reported by such client-server communications is reported by exception (rather than as a bulky record) when that is permitted.

If the above gives the impression that these systems are complex and their communications needs not easily ordered into a linear set, then I have succeeded in providing some background. The most interesting aspects of these systems is that the plant owners have substantial incentives to improve their processes and increase anticipatory maintenance, both of which lead to greater revenue. The scale of the problems is large, but so are the potential profits. It is not an accident that many of the world's most valuable corporations have a significant portion of their profits generated by such plants. Thus they are more willing than most to try new technology, and to invest in it heavily if it proves reliable and improves the bottom line.

Cheers,
-Tom
=====

On 2013.03.09 22:04, Shitanshu Shah (svshah) wrote:

Hi Tom, All,

In general I agree with your assessment for need for packet priority.

If forwarding service required is similar to all packets for a given class, packet priority is all what is required for such services. Like traffic classes in your example for deterministic class, alerts, server-client response etc.. Treating such class of traffic at the aggregate level per packet priority (eg. Based on certain L3 or L2 level code-points) is what is needed for forwarding(queuing) service.

I also lack to understand, what is the definition of flow, is it micro-flow that has been eluded before?

For micro-flows belonging to the same service class (eg. Belonging to alert class), what kind of differentiation needed for different flows within that class? I would imagine that queuing is expensive and scarce resources in these systems as well just like it is for traditional Layer2 switches (at the least in the hardware). Thus probably very expensive to imagine as many queues as many set of flows with different priority.

However, a sort of differentiation that can be imagined at flow level may be for other parameters (but not queuing behavior). For example, limiting rate of a flow that is fed in to the queuing system.excessive traffic to be dropped or re-marked to lower priority code-point. Or during congestion drop traffic from certain flows over others.

I understand RSVP provides a way to enable integrated queuing service. At the same time, RSVP also has a way to enable reservation based on Diffserv. It is later that I am trying to highlight where packet priority is used for the forwarding decision, and flow classification may be used to constrain other parameters like rate.

Taking deterministic class in particular,

As I am not so much familiar with Industrial Automation, what I am not clear if for a given PAN, could there be different set of flows with different deterministic parameters? If there are then I can see some impact to queuing discipline and packet priority itself may not be sufficient. But I am not sure why there would be an application (PAN) with devices with different set of deterministic flows. I can also see them causing conflicts when it comes to channel reservation.

Regards,

Shitanshu

From: Tom Phinney <tom.phinney@cox.net>
Reply-To: Tom Phinney <tom.phinney@cox.net>
Date: Saturday, March 9, 2013 11:27 AM
To: "6tsch@ietf.org" <6tsch@ietf.org>
Subject: Re: [6tsch] priority and priority

Kris (et al),

Thank you for your reply and general concurrence.

In my post below I was objecting to the conclusion in the originally quoted interchange, recolored in red at the end below, that flow priority suffices. My example flow 2) is one in which packet priority makes sense WITHIN the flow. Note also my text in blue below, highlighting a similar conclusion for use of flows 2) and/or 3) as backup channels for flow 1) when flow 1) becomes too unreliable.

My post below was an attempt to point out that queuing priority AND flow priority need not be disjoint approaches; in my opinion they can and should both exist concurrently in systems that deal with real-world problems, at least in the industrial and process automation markets. They each provide some functionality that the other lacks, giving an overall system that is superior in its performance of critical functions to one that restricts itself to flow priority, or to queuing priority, but without supporting both. My post below was an attempt to sketch a common scenario in that market where the two priority mechanisms interplay to provide a superior system.

Perhaps my concept of flow differs from those of the original two correspondents. If so, I would appreciate instruction/correction. My scenario below was one where resources assigned to secondary flows would be used when the primary flow 1) fails. If others' notions of flow are not coupled to contracts and resource allocation in forwarding DL and NL routers, potentially including slot prioritization among concurrent slots, then what use is it?

-Tom
=====
On 2013.03.09 10:36, Kris Pister wrote:
Tom - I agree with everything that you wrote in the email below except the first sentence. Perhaps I'm missing something, but it seems to me that your enumeration of various flows with different requirements fits quite well with the discussion fragment quoted at the end below. Can you help me understand the differences?

My bias is that we should have
{flow priority (equating to slotframe priority) OR link priority (TX, RX, uni-, multi-)} AND {packet priority}.

ksjp

On 3/8/2013 11:36 PM, Tom Phinney wrote:

I don't know what real-world use this technology is being designed to support, but the discussion fragments quoted at the end below seem to me to be moving away from reality.

In the industrial automation world where I spent most of my career, it makes sense to have different inbound flows for differing purposes. Most sensor devices would have the following flows 1), 2) and 3):
1) a dedicated inbound flow per field device (mote) for periodic deterministic control traffic, where the number of expected hops has to be small when the loop rate is high;
2) a shared inbound flow for usually-infrequent alerts (i.e., device events and process alarms), probably with at least 2-level packet queuing priority, perhaps using CSMA/CA to provide some slight statistical bias in channel access priority based on packet priority;
3) a shared inbound flow for server-to-client responses, typically in response to a program-initiated or operator-initiated action at an engineering workstation in a control room.

Every process actuator (i.e., mote that can directly manipulate the process) will also have another flow 4), analogous to 1) but in the outbound direction, from the control room to the field to transfer the computed actuator setting to the device. Such devices also have inbound flow 1) to transfer actuator status at the same periodic rate.

If communication problems are experienced on flow 1), it is desirable to be able to use one or more of the other flows as a backup mechanism for inbound control traffic, whether full determinism can be maintained or not. Packet queuing priority definitely makes sense here, so that such traffic can leapfrog non-deterministic traffic in the output queue of the originating mote and of any intermediary relaying motes. As mentioned in 2), it may also make sense to use CSMA/CA to provide some slight statistical bias in channel access priority based on packet priority, at least for flows where the transaction template has a non-null CSMA/CA sensing interval before the scheduled Data PDU transmission of the transaction.

The reason for priority on the alerts is that process alarms from class 1 - 2 devices, (or class 0 - 2 if class 0 uses wireless), would usually be configured to be higher priority than those for classes 3 - 5. (Some class 3 devices might also be assigned that higher alarm priority.) Where safety alarms are concerned, such as the "man down" alarm that Herman Storey mentioned on Thursday's call, those alarms would tend to use another flow 5), similar to 2), with a very low expected rate but with dedicated slots to provide a clear virtual channel (to the extent possible). The time constants for such alarms are long enough (seconds) compared to much of the flow 1) traffic that the incremental cost of flow 5), which is so similar to flow 2), probably would not be very much, perhaps 2% of the total network capacity. It is tempting to just have a 3-way CSMA/CA interval for flow 2), but the reality of the CSMA/CA priority assessment mechanism is that it is very weak, given physical modem delays, inter-mote timing skews and the old hidden-node problem. Thus CSMA/CA prioritization does not seem reliable enough to trust for safety purposes.

Of course a system will have other flows, many allocated on a demand basis in response to some occasional need, such as firmware download or captured waveform upload.

The basic timing of continuous control loops it that the total delay from sensing the process to driving the actuator generally has to be less than twice the period of the loop. In such cases the needed control strategy is both relatively simple and relatively stable, However, if the total input-to-output delay exceeds twice the loop period, the control becomes much more difficult and less able to respond to fast transients. Jitter is a problem only when a process value message is delivered in the wrong cycle; it does not matter whether the delivery is at the 10% or 90% point in the proper cycle. Authenticated timestamping of the message, as occurs via the UDP transport nonce in ISA100.11a, provides a means of discarding process value messages that are delivered too late, thus converting those situations to ones of message loss. That is the desired way of handling late delivery, because the control loop is stable under message loss but not when the values are deliberately selectively delayed into the wrong delivery cycle. (Honeywell demonstrated years ago that it was possible to destablize most control loops with such deliberately jittered delivery.)

-Tom
=====
On 2013.03.08 23:33, Thomas Watteyne wrote:
Michael,

I agree that the term flow is almost as overloaded as link or path. This is not a final term and we will have to come up with a better term. Suggestions?

If I get your point correctly, you are suggesting that there should be no packet priorities, only flows priority. In a TSCH context, this would translate into a slotframe per flow, where flow also incorporates the notion of priority. Correct? This is a very important discussion we started over the phone earlier today, but which we will continue Wednesday in Orlando. I hope you can be there.

Thomas

On Fri, Mar 8, 2013 at 6:55 PM, Michael Richardson <mcr+ietf@sandelman.ca> wrote:

How can a flow have high priority packets and low priority packets?

Why are these two types of packets using the same DODAG?

_______________________________________________ 6tsch mailing list 6tsch@ietf.org https://www.ietf.org/mailman/listinfo/6tsch" rel="nofollow">https://www.ietf.org/mailman/listinfo/6tsch

_______________________________________________ 6tsch mailing list 6tsch@ietf.org https://www.ietf.org/mailman/listinfo/6tsch" rel="nofollow">https://www.ietf.org/mailman/listinfo/6tsch

Re: [6tsch] how to handle different classes of tr… Thomas Watteyne
Re: [6tsch] how to handle different classes of tr… Pascal Thubert (pthubert)
Re: [6tsch] how to handle different classes of tr… Michael Richardson
Re: [6tsch] how to handle different classes of tr… Pascal Thubert (pthubert)
Re: [6tsch] how to handle different classes of tr… Xavier Vilajosana
Re: [6tsch] how to handle different classes of tr… Michael Richardson
Re: [6tsch] how to handle different classes of tr… Xavier Vilajosana
[6tsch] fraglets Tom Phinney
Re: [6tsch] how to handle different classes of tr… Michael Richardson
Re: [6tsch] what to call "flow" with different pr… Michael Richardson
Re: [6tsch] what to call "flow" with different pr… Timothy J. Salo
Re: [6tsch] how to handle different classes of tr… Kris Pister
[6tsch] how to handle different classes of traffic Michael Richardson
Re: [6tsch] what to call "flow" with different pr… Pascal Thubert (pthubert)
[6tsch] what to call "flow" with different priori… Michael Richardson
Re: [6tsch] priority and priority Kris Pister
Re: [6tsch] priority and priority Shitanshu Shah (svshah)
Re: [6tsch] priority and priority Shitanshu Shah (svshah)
Re: [6tsch] priority and priority Kris Pister
Re: [6tsch] priority and priority Tom Phinney
Re: [6tsch] priority and priority Shitanshu Shah (svshah)
Re: [6tsch] priority and priority Kris Pister
Re: [6tsch] priority and priority Tom Phinney
Re: [6tsch] priority and priority Kris Pister
Re: [6tsch] priority and priority Tom Phinney
Re: [6tsch] priority and priority Thomas Watteyne
Re: [6tsch] priority and priority Michael Richardson
[6tsch] R: priority and priority Alfredo Grieco
[6tsch] priority and priority Thomas Watteyne