Re: [IPFIX] Semantic and structured data

Atsushi Kobayashi <akoba@nttv6.net> Wed, 17 March 2010 21:45 UTC

Return-Path: <akoba@nttv6.net>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 26E443A69E1 for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 14:45:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.408
X-Spam-Level: *
X-Spam-Status: No, score=1.408 tagged_above=-999 required=5 tests=[AWL=-0.333, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13, HOST_MISMATCH_COM=0.311, RDNS_DYNAMIC=0.1, SUBJ_RE_NUM=2.799]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XhlCcks1GLOY for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 14:45:54 -0700 (PDT)
Received: from mail.nttv6.net (mail.nttv6.net [IPv6:2001:fa8::25]) by core3.amsl.com (Postfix) with ESMTP id 8FA763A69CD for <ipfix@ietf.org>; Wed, 17 Mar 2010 14:45:53 -0700 (PDT)
Received: from [127.0.0.1] (dhcp-3-152.nttv6.com [192.47.163.152]) by mail.nttv6.net (8.14.3/8.14.3) with ESMTP id o2HLjw7q039751; Thu, 18 Mar 2010 06:46:00 +0900 (JST) (envelope-from akoba@nttv6.net)
Date: Thu, 18 Mar 2010 06:40:40 +0900
From: Atsushi Kobayashi <akoba@nttv6.net>
To: Benoit Claise <bclaise@cisco.com>
In-Reply-To: <4BA0D4CB.6000308@cisco.com>
References: <4BA0D177.1070506@net.in.tum.de> <4BA0D4CB.6000308@cisco.com>
Message-Id: <20100318051549.AFC9.17391CF2@nttv6.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.50.05 [ja] (Unregistered)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (mail.nttv6.net [192.16.178.5]); Thu, 18 Mar 2010 06:46:00 +0900 (JST)
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2010 21:45:56 -0000

Hi Benoit, and all,

I agree solution3 rather than new semantic IE. 
As Benoit mentioned, semantic IE seems to increase the difficulty to
interpret them.

As Brian mentioned, I think it should avoid to have unnecessary
complexity. But, I am not sure what type, e.g., NONE, OR, AND, NOR,
RANGE, and ORDERRED, is needed. I would like to avoid that one data
structure is represented in multiple ways. To examine it, we needs more
practical examples in addition to: Selector Report
Interpretation and interface lists on aggregated Flow Records.

How about BGP AS Path, Community? Anything else?

When I have the following BGP AS Path mixing as-sequence and as-set, how
to present it?

10 20 30 40 {50,60}

(basicList, ORDERED, (basicList, ORDERED, AS10,AS20,AS30,AS40),
(basicList, OR, AS50, AS60))

Is it correct?

Regards,
Atsushi

On Wed, 17 Mar 2010 14:10:35 +0100
Benoit Claise <bclaise@cisco.com> wrote:

> Thanks Gerhard for your feedback.
> See inline.
> >
> > Hi all,
> >
> > Some general thoughts from my side:
> >
> > - I appreciate that you want to add a basic notion of semantic to the
> >   structured data.
> Great.
> >
> > - Up to now, semantic was not in the protocol but in the info model.
> >
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > |0|               Field ID    |       Element Length            |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > | Semantic  |             BasicList Content ...                 |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > |                           ...                                 |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >
> >   Can't we encode Semantic in an IE?
> >   Then, we could do without a new IANA registry.
> >   And: This IE could be used for other purposes (e.g., in Templates)
> We thought about this one. However, one of the IPFIX principle is that 
> the semantic of one IE can't depend on the value or position of some 
> other IEs.
> One example of this is the (MPLS label position, MPLS label). Way before 
> structured data, our initial implementation contained: MPLS label 
> position, MPLS label.
> So the collector had to look first at the MPLS label position value in 
> order to correctly understand the following  MPLS label.
> Somehow, with these conventions, we were having an information model on 
> the top of the IE. This was wrong.
> Also it was not obvious for a collector, i.e. without hardcoding the 
> information.
> Finally, what if the IE order changed within the flow record?
> Conclusion: we had to change back our implementation mplslabelposition1, 
> mplslabelposition2, etc... Like we did in IPIFX ;-)
> 
> 
> >
> > - I'm not sure whether the bitvector encoding has an advantage over an
> >   enumeration type.
> >
> > Unfortunately, I do not have the time to participate in a deep 
> > discussion - I'm busy with IPFIX-CONFIG and other stuff.
> Thanks and regards, Benoit.
> >
> > Regards,
> > Gerhard
> >
> > Benoit Claise wrote:
> >> Hi Brian,
> >>> Hi, Benoit,
> >>>
> >>> Replies inline...
> >>>
> >>> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
> >>>
> >>>> Hi Brian,
> >>>>
> >>>> Thanks for your feedback.
> >>>>> hi, Benoit, all...
> >>>>>
> >>>>> Agreed that we should do something about this (i.e., that solution 
> >>>>> 1 is no solution.); that said, a few comments in no particular order:
> >>>>>
> >>>>> 1. In considering adding explicit semantics to structured data, we 
> >>>>> as a WG are taking on the task of defining semantics for IPFIX as 
> >>>>> a whole. Semantics, as I understand them, are in IPFIX largely 
> >>>>> contextual and template dependent, but in almost all cases I can 
> >>>>> think of it seems like these are implicitly "AND" semnantics (this 
> >>>>> flow has source IP A _AND_ destination IP B _AND_...).
> >>>>>
> >>>> Agreed.
> >>>>> We will need to make an explicit statement on this.
> >>>> Not sure why.
> >>>>> We will need to determine whether these implicit semantics are a 
> >>>>> property of the protocol (in which case Structured Data is really 
> >>>>> a protocol-level extension), or a property of each information 
> >>>>> element (in which case all 5103 IEs have implicit AND semantics; 
> >>>>> question 2: will we need to add semantics to the IANA registry in 
> >>>>> this case?). We will need to be quite careful about this. It's not 
> >>>>> as simple as defining semantics within structured data then 
> >>>>> calling it done.
> >>>>>
> >>>>>
> >>>> I'm not sure why each individual IE should have a semantic ... in 
> >>>> the IANA registry.
> >>> I'm not either. On reflection it seems like overkill. I'm just 
> >>> saying that if we, as a WG, are moving from stating that semantics 
> >>> are explicitly out of scope to defining them as in-scope, we need to 
> >>> have a consistent approach, and answers to all the questions that 
> >>> arise when we consider moving the protocol from a simple framing 
> >>> mechanism to a framing mechanism with some logic behind it, so we 
> >>> answer them once, so that all future efforts having to do with 
> >>> semantics are consistent. This, I acknowledge, is an argument in 
> >>> favor of solution 3...)
> >>>
> >>> One very simple new question that arises here, to illustrate my 
> >>> point: Is it legal to export a record that has sourceIPAddress X AND 
> >>> NOT sourceIPAddress X?
> >>  From a protocol point of view, yes
> >>  From a semantic point of view, I don't see a use case for that.
> >> Now, this question is not different that: with RFC5101,  is it legal 
> >> to export a record that has two instances of sourceIPAddress?
> >>>> As you wrote, "in almost all cases I can think of it seems like 
> >>>> these are implicitly "AND" semnantics", so the case we try to solve 
> >>>> is when there are multiple instances of a single IE. We know that 
> >>>> RFC 5101 foresaw that case " The Collector MUST support the use of 
> >>>> Templates containing multiple occurrences of the similar 
> >>>> Information Elements", but the idea is not to change RFC5101.  If 
> >>>> we put some semantic in the IPFIX structured data, that would solve 
> >>>> the vast majority of our cases. Also, we could say: if you want 
> >>>> some semantic when exporting multiple IEs, then you SHOULD use 
> >>>> IPFIX structured data.
> >>>>
> >>>> Also, the default value for the semantic field in the IPFIX 
> >>>> structured data SHOULD be "NONE", to express that flow record 
> >>>> doesn't include any semantic.... like in RFC5101. You might draw 
> >>>> your own conclusion, maybe because you know your network, maybe 
> >>>> because you have configured the exporter, but then it's your decision.
> >>>> The way I see the proposed solution is: in IPFIX structured data, 
> >>>> you MAY use the semantic field as a way to express the relationship 
> >>>> between IEs within the structure.
> >>>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid 
> >>>>> argument against Solution 2, as it's trying to solve a somewhat 
> >>>>> different and more limited problem than structured data. Solution 
> >>>>> 2 _might_ cause a problem for this draft, but certainly not the 
> >>>>> other way around (unless we as a WG want to subsume that work into 
> >>>>> this draft, which would probably require rechartering...). Also, I 
> >>>>> don't think we need semantics for all of the list types, just 
> >>>>> basicList (Illustrative question: How do I meaningfully interpret 
> >>>>> two "or" subTemplateMultiLists with disjoint IE sets? Two "and" 
> >>>>> subTemplateMultiLists? Nestings thereof?)... So, we don't really 
> >>>>> have an explosion to deal with if we do Solution 2 correctly: 
> >>>>> andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs, 
> >>>>> nestable, done. What can't we do with those four IEs? In this 
> >>>>> case, we could even step back and say that semantics outside these 
> >>>>> four are within the protocol explicitly _undefined_, and to be 
> >>>>> interpreted withi
> > n the
> >>>>>   context of each Template.
> >>>>>
> >>>>>
> >>>> If I translate some more what I wrote "While it's solved the router 
> >>>> and most mediation function needs today", this would be "Me, 
> >>>> myself, and I don't need more that OR and AND now ;-)".
> >>>> However, who am I to tell that others don't need it now... and that 
> >>>> the logical solution is to use the IPFIX structured data
> >>>> Furthermore, if we think a little bit longer term, the next big 
> >>>> step in IPFIX is the mediation function. In my company, every 
> >>>> features want to export his own data with NetFlow/IPFIX... up to 
> >>>> the point where a CPE would not have enough bandwidth across the 
> >>>> WAN to export all the "management" information. So we'll need more 
> >>>> and more of aggregated flow records (both in time and space) even 
> >>>> in the router. Again, the logical solution will be to use the IPFIX 
> >>>> structured data. At this point, we will most probably need 
> >>>> something else than OR and AND, i.e. RANGE, ORDERRED, etc...
> >>>>
> >>>> An example of subTemplateMultiLists with disjoint IE sets, let's 
> >>>> imagine that you have to export an aggregated observation point, 
> >>>> composed of multiple template records
> >>>>      template 1: exporterIPaddress
> >>>>      template 2: exporterIPaddress, basicList of interfaces
> >>>>      template 3: exporterIPaddress, LC
> >>> and then you'd want to OR these... Okay, makes sense...
> >>>
> >>>>> 3. If we really _do_ want ranges and so on (which, again, we'd 
> >>>>> need to get WG consensus on; this is explicitly out of scope in my 
> >>>>> reading of the present charter), then we could do them in the 
> >>>>> scope of Solution 3.
> >>>> Btw, acknowledging that one day we will have to solve this is good 
> >>>> enough for solution 3. I mean, we don't have to populate the 
> >>>> semantic IANA now... even if that would be more efficient.
> >>>>> However, this seems a little not-quite-fleshed-out-enough for me 
> >>>>> to say whether I like it or not. Could you present an example of 
> >>>>> how you would use the proposed Semantic field to model your 
> >>>>> example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
> >>>>>
> >>>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
> >>>>>
> >>>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1 
> >>>>> eth2) (notBasicList (orBasicList ingressInterface eth3 eth4)) 
> >>>>> (andBasicList lineCardID 2))
> >>>>>
> >>>>>
> >>>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2), 
> >>>> (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
> >>> Hm. Okay. This makes sense. A couple of questions then, about 
> >>> solution 3:
> >>>
> >>> 1. Would we want to try and bitfield this, to define the semantics 
> >>> we _know_ we have, then leave the rest of it reserved? Something like:
> >>>
> >>> MSb                      LSb
> >>> +---+------+---------------+
> >>> | ! | multi| reserved      |
> >>> +---+------+---------------+
> >>>
> >>> ! = negate sense of semantics if 1 (this is the NOT flag)
> >>> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10: 
> >>> xor/exactlyOne, 11: and/exactlyAll
> >>> reserved = place for adding other bells and whistles like RANGE and 
> >>> so on in the future.
> >> Maybe we want to start by asking the question: which semantic do we 
> >> need now?
> >> Is a need for NONE, OR, AND, ORDERED for now. Anything else?
> >>> 2. Are we sure we want to do this in one byte? If we do bitfielding 
> >>> as above this gives us 32 possible extensions, which seems like 
> >>> _way_ more than enough, but does stick another odd offset in there, 
> >>> which slows things down on machines that need aligned access. 
> >>> Probably one byte is okay and we let the implementation use 
> >>> paddingOctets and set padding to fix this...
> >> With structured data, is the alignment still important? When I look 
> >> at the examples throughout the draft... As you wrote, paddingOctets 
> >> might be the solution in this case.
> >>> 3. Would it make sense maybe to have two sets of structured data 
> >>> elements, one with the semantics byte, and one without (which is 
> >>> then explicitly undefined)? Then exporters who don't need it don't 
> >>> have to bother sticking an extra odd-aligned zero in the stream for 
> >>> every list.
> >>>
> >>> I'm sure I'l have more questions, but none come to mind now... But 
> >>> it seems like we're converging on the least-unnecessarily-complex 
> >>> solution here, which is good. :)
> >> Happy about that. ;-)
> >> Note: I was envisioning even something more simpler: one byte, 
> >> containing all the semantic possibilities, administered by IANA.  So 
> >> no reserved, no !
> >>
> >> Regards, Benoit.
> >
> 
> _______________________________________________
> IPFIX mailing list
> IPFIX@ietf.org
> https://www.ietf.org/mailman/listinfo/ipfix
> 

--- 
Atsushi KOBAYASHI  <akoba@nttv6.net>
NTT Information Sharing Platform Lab.
tel:+81-(0)422-59-3978 fax:+81-(0)422-59-5637