Re: [Dime] Conclusion for Sequence Numbers

Steve,

You are talking about "what if one of the agents in the realm has very skewed view of the state of the realm - for whatever reason".
I guess this is much bigger issue and will lead to bigger problems, e.g. all the clients connected to this agent and not sending any request via other agents of the realm, will always have skewed view of the realm.
So that problem should be solved, first. (of course, not here, not by us).

In general, for realm report to be usable, we need to assume that agents are synchronized (except for a very small window of time) and have consistent view (if not exactly the same view always) of the same realm.
Otherwise, realm report will itself create issues among different clients and hence should be avoided completely.

Regards,
Nirav.

From: Steve Donovan [mailto:srdonovan@usdonovans.com]
Sent: Monday, December 16, 2013 9:39 PM
To: Nirav Salot (nsalot); Jouni Korhonen
Cc: Ben Campbell; dime@ietf.org list
Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI: comments to 4.3

Nirav,

Assume that there are multiple agents reporting on realm overload.  The method these agents use to determine realm overload is an implementation decision, but will require some form of exchange of state between the agents to keep them in sync.  Note that we can't assume that all agents are getting all host overload reports.

Under normal circumstances, the difference between the overload reports from these agents should be small, as you assert.

But consider the case where the replication mechanism between the agents is either congested or the network used by the agents to exchange state is no longer available.  It is easy to construct scenarios in this situation where agents could be reporting widely different realm overload report values.

This is clearly an unlikely event, but it is one that can easily be prevented by including the id of the node that sends a report as part of the report.

Regards,

Steve
On 12/16/13 9:42 AM, Nirav Salot (nsalot) wrote:
Steve,

> The issue with considering only the last report is that it introduces the potential for thrashing between different values.  This could make the overload event even worse.
I am not sure about this. The two agents are not synchronized only for a very small window of time. But even then their reports should be reflecting the status of the realm, approximately if not accurately.
So I don't think using one of the reports will make the situation worse.

I did not understand your argument regarding the security.
The client does not know the agent's identity anyway. So adding agent's identity in the overload-report is not going to change the security consideration anyway.

Regards,
Nirav.

From: Steve Donovan [mailto:srdonovan@usdonovans.com]
Sent: Monday, December 16, 2013 5:47 PM
To: Nirav Salot (nsalot); Jouni Korhonen
Cc: Ben Campbell; dime@ietf.org<mailto:dime@ietf.org> list
Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI: comments to 4.3

Nirav,

The issue with considering only the last report is that it introduces the potential for thrashing between different values.  This could make the overload event even worse.

I agree that this should be a rare case.  I still think, however, that we need to have defined behavior.

One other argument for including the sender of the overload report in the report itself is security.  The ability of a bad actor to insert a malicious overload report can be a very effective DOS attack.  I know we have said we aren't addressing security yet but this seems pretty short sighted.  Being able to establish that the identity of the sender of an overload report will be an important part of the security solution.  We should take this step in that direction.

Steve
On 12/12/13 10:26 AM, Nirav Salot (nsalot) wrote:
Steve,

So as I understand it is not a common case for different agent to provide different view of the same realm and this may have happen during a small window when synchronization has not taken place between the geographically distributed agents.
Right?

If so, I can understand the following part of your proposal.
One proposal for how we deal with the fact that different reports can have different values is to have the reacting node treat the first reporting node as the authority for reporting realm overload state for that overload instance.

i.e. I can understand to define some behavior for the reacting node to handle the case (which is anyway rare case) when two agents provide different realm-report for the same report.
The behavior could be simply to consider only the last report when two agents have sent two different reports of the same realm. (And this will also work when the same agent has sent two different realm-reports, purposefully - e.g. due to the change in the realm overload).
But this still does not require adding of agent's identity in the overload-report.

Regards,
Nirav.

From: Steve Donovan [mailto:srdonovan@usdonovans.com]
Sent: Thursday, December 12, 2013 7:50 PM
To: Nirav Salot (nsalot); Jouni Korhonen
Cc: Ben Campbell; dime@ietf.org<mailto:dime@ietf.org> list
Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI: comments to 4.3

Nirav,

See inline.

Steve
On 12/12/13 6:40 AM, Nirav Salot (nsalot) wrote:

All,

I do not understand this discussion regarding different agents of the same realm having different view of the realm and provide different overload report.
We can make the statement that all senders of realm reports should send the same report.  This does not guarantee that it will always happen.  If agents are sending the report, they are generally distributed elements.  In very large networks, this distribution can span continents.  There will be a lag in the "synchronization" of the realm overload information.

My concern is that we have well defined behavior for when a reactor receives conflicting realm reports.  We need to avoid thrashing between different reduction levels, which could make the overload situation worse.

Additionally, I also do not understand the proposal of adding identity of the agent generating "realm report" into the report.
Adding the endpoint identity is needed to allow the reacting node to know that it is receiving two different views of Realm overload from two different reporting end-points.

What is the use of this identity at the reacting node when the report is realm report? Why should the reacting node care who generated the realm report?
One proposal for how we deal with the fact that different reports can have different values is to have the reacting node treat the first reporting node as the authority for reporting realm overload state for that overload instance.  In this case, the reacting node would ignore reports received from other reporting nodes. In order to ignore reports from non authoritative endpoints requires the reacting node to know which endpoints send which reports.

Regards,

Nirav.

-----Original Message-----

From: DiME [mailto:dime-bounces@ietf.org] On Behalf Of Jouni Korhonen

Sent: Thursday, December 12, 2013 5:06 PM

To: Steve Donovan

Cc: Ben Campbell; dime@ietf.org<mailto:dime@ietf.org> list

Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI: comments to 4.3

Steve,

On Dec 11, 2013, at 3:13 PM, Steve Donovan <srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com> wrote:

Jouni,

We need the sequence number to be strictly increasing.  I don't see the need for it to increase in uniform amounts.  Using time does fit these requirements.  I'm ok with using time as long as we don't call the AVP timestamp.

Ulrich does bring up an interesting use case, where a client is receiving realm reports for the same realm from different agents.  We need to define the clients behavior in this case.

Any suggestions? I mean agents may have hugely different view of the realm if they are acting on their own.

Presumably the client needs to be able to determine who generated the realm report.  This cannot be determine based on the content of the message or the connection on which the message arrived.  It seems like we might need "Report Generator Diameter ID" in the overload report specifically for Realm reports.

Once the client is able to differentiate between realm reports sent by different agents (or servers) we need logic defining how the client deals with a new overload report.

I need now to check one of the basic assumptions on DOIC now so that we have the same understanding. I went back to the endpoint text in Section 5.1. There, for example in Figures

4 and 5 the DOIC association and the endpoint assumption does does not work IMHO because we have no endpoint identity in the OLR. In order the endpoint assumption to work (as I drew it on the white board in Porto), it would require as many Diameter level sessions as there are DOIC associations.

So.. has assumptions shifted in a meanwhile and I have just not paid attention?

I see a couple of options (others will probably see options I am missing):

- Use the last received realm report - This introduces the possibility of thrashing between two different reduction values and different durations.  Note that this approach does not require the source of the report to be included in the report.

- Only listen to one source of realm overload - The approach would be to remember who sent the first overload report from the realm and ignore realm overload reports from other sources.  This behavior would likely be constrained to a single occurrence of realm overload.  Meaning that the "memory" of the report source would only last as long as that overload event persists.  Once the overload event goes away, the report source would be forgotten and a new source could be used for the next occurrence.

On the surface, the second approach looks better to me.

Or add the identity of the OLR originator explicitly if it cannot be determined implicitly (i.e. from the Diameter message's Origin-Host/Realm).

Or assume the endpoint really is the endpoint in DOIC and Diameter session sense.

- Jouni

Steve

On 12/11/13 2:15 AM, Jouni wrote:

Ulrich,

I might be slow but.. Section 4.4 says

   control endpoints.  The sequence number is only required to be unique

   between two overload control endpoints and does not need to be

Unique between two endpoints..

Section 5.1 talks about endpoints:

   of an arbitrary Diameter network.  The overload control information

   is exchanged over on a "DOIC association" between two communication

   endpoints.  The endpoints, namely the "reacting node" and the

   "reporting node" do not need to be adjacent Diameter peer nodes,

nor

So if your agents inject realm reports, they need to be endpoints to

the client. Similar to Figure 5. Therefore the sequence number spaces

between

C-A1 and C-A2 are separate.

Now it is not clear to me, whether in your reasoning the C would see

the server identity (as the endpoint) when there is an active "DEP

agent" on the path. That would not clearly work and not be align with

the endpoint assumption.

Note that at some point of time we had (at least on a discussion

level in f2f meeting) report originator identity in the OLR. That

would make endpoint identification trivial. Now a "DEP agent" needs

to act as a "server" for its clients in order to appear as an endpoint.

- Jouni

ps: still think the use of Time is simpler..

On Dec 11, 2013, at 9:43 AM, Wiehe, Ulrich (NSN - DE/Munich) wrote:

That's not predictable. It may be the same server in some cases, and different servers in other cases.

-----Original Message-----

From: ext Jouni [

mailto:jouni.nospam@gmail.com

]

Sent: Wednesday, December 11, 2013 8:38 AM

To: Wiehe, Ulrich (NSN - DE/Munich)

Cc: Ben Campbell;

dime@ietf.org<mailto:dime@ietf.org>

 list; Steve Donovan

Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI:

comments to 4.3

Ulrich,

On Dec 11, 2013, at 9:21 AM, Wiehe, Ulrich (NSN - DE/Munich) wrote:

Jouni,

ad 1. "monotonically" does not express your intention. What we are looking for may be "stepwise with fixed step".

Ad 2. Is not necessarily a mistake that could result in out-of-sequence sequence numbers. When a client C sends a realm-type requests towards any server in the realm, an agent A1 that selects the server would send back the realm-type OLR with sequence number s1. The next realm-type request sent by C (that survived the throttling) may take a path that does not include A1 but A2. A2 then selects the server and sends back a sequence number s2. Nothing ensures that s1 and s2 are in sequence.

Would the server in both cases (via A1 and A2) be the same?

- Jouni

Ulrich

-----Original Message-----

From: ext Jouni Korhonen [

mailto:jouni.nospam@gmail.com

]

Sent: Tuesday, December 10, 2013 10:31 PM

To: Wiehe, Ulrich (NSN - DE/Munich)

Cc: Ben Campbell;

dime@ietf.org<mailto:dime@ietf.org>

 list; Steve Donovan

Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re: OVLI:

comments to 4.3

Ulrich,

On Dec 10, 2013, at 4:31 PM, "Wiehe, Ulrich (NSN - DE/Munich)"

<ulrich.wiehe@nsn.com><mailto:ulrich.wiehe@nsn.com>

 wrote:

Jouni,

1. I find the texts

a) "The sequence number ... does not need to be monotonically increasing"

and

Means the delta from old-seqno to new-seqno can be any non-negative

integer (within the given limits) not something fixed step/delta

(like +1). As long as "new-seqno >= old-seqno" holds we are fine.

b) "...the new sequence number MUST be greater or equal than the old sequence number..."

contradicting.

Can you please clarify.

See above. (mind the overflow case)

2. The expected behaviour when receiving an out-of-sequence sequence number within OC-OLR is described in 4.3:

"The receiver SHOULD discard an OC-OLR AVP with a sequence number that is less than previously received one."

I don't find this very robust. Once a higher sequence number (received erroneously by mistake) is accepted you cannot (easily) recover.

I find it more robust in a sense that I should not care about stale old information.

However, since we are piggybacking (by popular demand) we have

little room for seqno re-sync negotiation.

What is the mistake you refer here? A misbehaving implementation?

In that case, it deserves to get a manual intervention once figured

out by admins checking alarms and logs. If the mistake is due other

things, like endpoints being out of sync, we currently have no written down mechanism to survive that.

3. The expected behaviour when receiving an out-of-sequence sequence number within the OC-Supported-Features AVP is not described. What is the intention here?

No intention. Just a sloppy specification. You are right that

something needs to be done & clarified here. (again the semantics

of Time would nice..)

I'll propose something. Others should too ;)

- Jouni

Ulrich

-----Original Message-----

From: DiME [

mailto:dime-bounces@ietf.org

] On Behalf Of ext Jouni Korhonen

Sent: Tuesday, December 10, 2013 8:28 AM

To: Ben Campbell;

dime@ietf.org<mailto:dime@ietf.org>

 list; Steve Donovan

Subject: Re: [Dime] Conclusion for Sequence Numbers - was Re:

OVLI: comments to 4.3

Fine.. lets define then the sequence number semantics. Basic

unsigned integer math. The text proposal is the following:

4.4.  OC-Sequence-Number AVP

The OC-Sequence-Number AVP (AVP code TBD3) is type of Unsigned64.

Its usage in the context of the overload control is described in

Sections 4.1 and 4.3.

>From the functionality point of view, the OC-Sequence-Number AVP

MUST be used as a non-volatile increasing counter between two

overload control endpoints.  The sequence number is only required

to be unique between two overload control endpoints and does not

need to be monotonically increasing.

When comparing two sequence numbers, the new sequence number MUST

be greater or equal than the old sequence number within a window

that is half of the size of the maximum sequence number. This

allows a simple handling of the sequence number overflow using

unsigned integer arithmeticf:

  #define WINDOW 0x8000000000000000ULL

  bool verify_seqnum( uint64_t newsn, uint64_t oldsn ) {

      if (newsn - oldsn <= WINDOW)

          // newsn >= oldsn

          return true;

      } else

          // outside window or newsn < oldsn

          return false;

      }

  }

The above should even work is someone shovels NTP times into

sequence numbers with a blind typecasting.

- Jouni

On Dec 10, 2013, at 12:34 AM, Ben Campbell <ben@nostrum.com><mailto:ben@nostrum.com>

 wrote:

On Dec 9, 2013, at 10:00 AM, Steve Donovan <srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com>

 wrote:

Jouni,

I propose that we keep the name OC-Sequence-Number but that we use the Time type for OC-Sequence-Number.  It is misleading and potentially confusing to call it OC-Time-Stamp.

I could live with that, although I would rather just define the expected properties of the sequence number, and leave the implementation up to the implementor. I assume your reasoning for not calling it a timestamp is that you do not want people to try to use it as a time base reference. If so, then we don't require any connection to a clock. We just need it to be monotonically increasing.

We might consider expanding on the format of the AVP to make it something like Session-ID, where it is a concatenation of the Diameter-ID of the generating node and a timestamp.  This might help the reacting node keep track of which sequence number it has received.

Do we need a uniqueness across multiple nodes property? If so, why?

Steve

On 12/9/13 5:37 AM, Jouni Korhonen wrote:

Folks

Could we conclude on the sequence number vs. time stamp vs. something else?

We got more important places to spend our energy than this ;)

My proposal is the following (based on the original pre-00 design):

o We change the OC-Sequence-Number to OC-Time-Stamp in all occurrences

in the -01.

o We use RFC6733 Time type for the OC-Time-Stamp. RFC6733 gives us

already exact definition how to handle the AVP.

o Define that the OC-Time-Stamp is the time of the creation of the

"original" AVP within whose context the time stamp is present.

o The OC-Time-Stamp AVP uniqueness is still considered to be in scope

of the communicating endpoints.

o The time stamp can be used to quickly determine if the content of

the encapsulating AVP context has changed (among other properties).

This would be useful specifically in the future when the encapsulating

grouped AVPs  grow in size and functionality.

- Jouni

_______________________________________________

DiME mailing list

DiME@ietf.org<mailto:DiME@ietf.org>

https://www.ietf.org/mailman/listinfo/dime

_______________________________________________

DiME mailing list

DiME@ietf.org<mailto:DiME@ietf.org>

https://www.ietf.org/mailman/listinfo/dime

_______________________________________________

DiME mailing list

DiME@ietf.org<mailto:DiME@ietf.org>

https://www.ietf.org/mailman/listinfo/dime

_______________________________________________

DiME mailing list

DiME@ietf.org<mailto:DiME@ietf.org>

https://www.ietf.org/mailman/listinfo/dime

_______________________________________________

DiME mailing list

DiME@ietf.org<mailto:DiME@ietf.org>

https://www.ietf.org/mailman/listinfo/dime