Re: [IPFIX] Semantic and structured data

Atsushi Kobayashi <akoba@nttv6.net> Sun, 21 March 2010 05:25 UTC

Return-Path: <akoba@nttv6.net>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C5A0C3A683C for <ipfix@core3.amsl.com>; Sat, 20 Mar 2010 22:25:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.929
X-Spam-Level: ***
X-Spam-Status: No, score=3.929 tagged_above=-999 required=5 tests=[BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, NO_RELAYS=-0.001, SUBJ_RE_NUM=2.799]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JrFv-U1WsxcC for <ipfix@core3.amsl.com>; Sat, 20 Mar 2010 22:25:33 -0700 (PDT)
Received: from mail.nttv6.net (mail.nttv6.net [IPv6:2001:fa8::25]) by core3.amsl.com (Postfix) with ESMTP id 747803A682C for <ipfix@ietf.org>; Sat, 20 Mar 2010 22:25:05 -0700 (PDT)
Received: from [192.168.0.134] (mail.nttv6.net [IPv6:2001:fa8::25]) by mail.nttv6.net (8.14.3/8.14.3) with ESMTP id o2L5P6SQ064029; Sun, 21 Mar 2010 14:25:08 +0900 (JST) (envelope-from akoba@nttv6.net)
Date: Sun, 21 Mar 2010 14:25:12 +0900
From: Atsushi Kobayashi <akoba@nttv6.net>
To: Benoit Claise <bclaise@cisco.com>
In-Reply-To: <4BA1E9C4.9030204@cisco.com>
References: <20100318051549.AFC9.17391CF2@nttv6.net> <4BA1E9C4.9030204@cisco.com>
Message-Id: <20100321142504.9220.17391CF2@nttv6.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.52.03 [ja] (Unregistered)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (mail.nttv6.net [IPv6:2001:fa8::25]); Sun, 21 Mar 2010 14:25:17 +0900 (JST)
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Mar 2010 05:25:41 -0000

Hi Benoit,

On Thu, 18 Mar 2010 09:52:20 +0100
Benoit Claise <bclaise@cisco.com> wrote:

> Kobayashi-san,
> > Hi Benoit, and all,
> >
> > I agree solution3 rather than new semantic IE.
> > As Benoit mentioned, semantic IE seems to increase the difficulty to
> > interpret them.
> >
> > As Brian mentioned, I think it should avoid to have unnecessary
> > complexity. But, I am not sure what type, e.g., NONE, OR, AND, NOR,
> > RANGE, and ORDERRED, is needed. I would like to avoid that one data
> > structure is represented in multiple ways.
> So do I.
>  From this, I conclude that a simple enumeration type is better compared 
> to the bitvector encoding.
> If we include a bitvector with one bit for NOT, we end up in this situation.
> 
>     NOT OR = NOR
>     NOT AND = NAND
>     NOT ORDERED = UNORDERED (*)
>     NOT UNORDERED = ORDERED (*)
>     NOT NONE => doesn't make sense  (**)
>     NOT RANGE => doesn't make sense  (**)
>     NOT NOT => doesn't make sense  (**)
> 
> 
> (*) implies that we can represent the same semantic different ways. Now, 
> we can argue that we don't need UNORDERED

If using "UNORDERED", we can use "AND" or "OR" alternatively.
In my conclusion, we can remove it.

> (**) add non possible and confusing semantics
> > To examine it, we needs more
> > practical examples in addition to: Selector Report
> > Interpretation and interface lists on aggregated Flow Records.
> >

That make sense.

> > How about BGP AS Path, Community? Anything else?
> >
> > When I have the following BGP AS Path mixing as-sequence and as-set, how
> > to present it?
> >
> > 10 20 30 40 {50,60}
> >
> > (basicList, ORDERED, (basicList, ORDERED, AS10,AS20,AS30,AS40),
> > (basicList, OR, AS50, AS60))
> >
> > Is it correct?
> >    
> Exactly, which implies that the ORDER semantic is required for this use 
> case.
> 

We needs more other examples.

Regards,
Atsushi


> Regards, Benoit.
> > Regards,
> > Atsushi
> >
> > On Wed, 17 Mar 2010 14:10:35 +0100
> > Benoit Claise<bclaise@cisco.com>  wrote:
> >
> >    
> >> Thanks Gerhard for your feedback.
> >> See inline.
> >>      
> >>> Hi all,
> >>>
> >>> Some general thoughts from my side:
> >>>
> >>> - I appreciate that you want to add a basic notion of semantic to the
> >>>    structured data.
> >>>        
> >> Great.
> >>      
> >>> - Up to now, semantic was not in the protocol but in the info model.
> >>>
> >>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>> |0|               Field ID    |       Element Length            |
> >>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>> | Semantic  |             BasicList Content ...                 |
> >>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>> |                           ...                                 |
> >>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>>
> >>>    Can't we encode Semantic in an IE?
> >>>    Then, we could do without a new IANA registry.
> >>>    And: This IE could be used for other purposes (e.g., in Templates)
> >>>        
> >> We thought about this one. However, one of the IPFIX principle is that
> >> the semantic of one IE can't depend on the value or position of some
> >> other IEs.
> >> One example of this is the (MPLS label position, MPLS label). Way before
> >> structured data, our initial implementation contained: MPLS label
> >> position, MPLS label.
> >> So the collector had to look first at the MPLS label position value in
> >> order to correctly understand the following  MPLS label.
> >> Somehow, with these conventions, we were having an information model on
> >> the top of the IE. This was wrong.
> >> Also it was not obvious for a collector, i.e. without hardcoding the
> >> information.
> >> Finally, what if the IE order changed within the flow record?
> >> Conclusion: we had to change back our implementation mplslabelposition1,
> >> mplslabelposition2, etc... Like we did in IPIFX ;-)
> >>
> >>
> >>      
> >>> - I'm not sure whether the bitvector encoding has an advantage over an
> >>>    enumeration type.
> >>>
> >>> Unfortunately, I do not have the time to participate in a deep
> >>> discussion - I'm busy with IPFIX-CONFIG and other stuff.
> >>>        
> >> Thanks and regards, Benoit.
> >>      
> >>> Regards,
> >>> Gerhard
> >>>
> >>> Benoit Claise wrote:
> >>>        
> >>>> Hi Brian,
> >>>>          
> >>>>> Hi, Benoit,
> >>>>>
> >>>>> Replies inline...
> >>>>>
> >>>>> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
> >>>>>
> >>>>>            
> >>>>>> Hi Brian,
> >>>>>>
> >>>>>> Thanks for your feedback.
> >>>>>>              
> >>>>>>> hi, Benoit, all...
> >>>>>>>
> >>>>>>> Agreed that we should do something about this (i.e., that solution
> >>>>>>> 1 is no solution.); that said, a few comments in no particular order:
> >>>>>>>
> >>>>>>> 1. In considering adding explicit semantics to structured data, we
> >>>>>>> as a WG are taking on the task of defining semantics for IPFIX as
> >>>>>>> a whole. Semantics, as I understand them, are in IPFIX largely
> >>>>>>> contextual and template dependent, but in almost all cases I can
> >>>>>>> think of it seems like these are implicitly "AND" semnantics (this
> >>>>>>> flow has source IP A _AND_ destination IP B _AND_...).
> >>>>>>>
> >>>>>>>                
> >>>>>> Agreed.
> >>>>>>              
> >>>>>>> We will need to make an explicit statement on this.
> >>>>>>>                
> >>>>>> Not sure why.
> >>>>>>              
> >>>>>>> We will need to determine whether these implicit semantics are a
> >>>>>>> property of the protocol (in which case Structured Data is really
> >>>>>>> a protocol-level extension), or a property of each information
> >>>>>>> element (in which case all 5103 IEs have implicit AND semantics;
> >>>>>>> question 2: will we need to add semantics to the IANA registry in
> >>>>>>> this case?). We will need to be quite careful about this. It's not
> >>>>>>> as simple as defining semantics within structured data then
> >>>>>>> calling it done.
> >>>>>>>
> >>>>>>>
> >>>>>>>                
> >>>>>> I'm not sure why each individual IE should have a semantic ... in
> >>>>>> the IANA registry.
> >>>>>>              
> >>>>> I'm not either. On reflection it seems like overkill. I'm just
> >>>>> saying that if we, as a WG, are moving from stating that semantics
> >>>>> are explicitly out of scope to defining them as in-scope, we need to
> >>>>> have a consistent approach, and answers to all the questions that
> >>>>> arise when we consider moving the protocol from a simple framing
> >>>>> mechanism to a framing mechanism with some logic behind it, so we
> >>>>> answer them once, so that all future efforts having to do with
> >>>>> semantics are consistent. This, I acknowledge, is an argument in
> >>>>> favor of solution 3...)
> >>>>>
> >>>>> One very simple new question that arises here, to illustrate my
> >>>>> point: Is it legal to export a record that has sourceIPAddress X AND
> >>>>> NOT sourceIPAddress X?
> >>>>>            
> >>>>    From a protocol point of view, yes
> >>>>   From a semantic point of view, I don't see a use case for that.
> >>>> Now, this question is not different that: with RFC5101,  is it legal
> >>>> to export a record that has two instances of sourceIPAddress?
> >>>>          
> >>>>>> As you wrote, "in almost all cases I can think of it seems like
> >>>>>> these are implicitly "AND" semnantics", so the case we try to solve
> >>>>>> is when there are multiple instances of a single IE. We know that
> >>>>>> RFC 5101 foresaw that case " The Collector MUST support the use of
> >>>>>> Templates containing multiple occurrences of the similar
> >>>>>> Information Elements", but the idea is not to change RFC5101.  If
> >>>>>> we put some semantic in the IPFIX structured data, that would solve
> >>>>>> the vast majority of our cases. Also, we could say: if you want
> >>>>>> some semantic when exporting multiple IEs, then you SHOULD use
> >>>>>> IPFIX structured data.
> >>>>>>
> >>>>>> Also, the default value for the semantic field in the IPFIX
> >>>>>> structured data SHOULD be "NONE", to express that flow record
> >>>>>> doesn't include any semantic.... like in RFC5101. You might draw
> >>>>>> your own conclusion, maybe because you know your network, maybe
> >>>>>> because you have configured the exporter, but then it's your decision.
> >>>>>> The way I see the proposed solution is: in IPFIX structured data,
> >>>>>> you MAY use the semantic field as a way to express the relationship
> >>>>>> between IEs within the structure.
> >>>>>>              
> >>>>>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid
> >>>>>>> argument against Solution 2, as it's trying to solve a somewhat
> >>>>>>> different and more limited problem than structured data. Solution
> >>>>>>> 2 _might_ cause a problem for this draft, but certainly not the
> >>>>>>> other way around (unless we as a WG want to subsume that work into
> >>>>>>> this draft, which would probably require rechartering...). Also, I
> >>>>>>> don't think we need semantics for all of the list types, just
> >>>>>>> basicList (Illustrative question: How do I meaningfully interpret
> >>>>>>> two "or" subTemplateMultiLists with disjoint IE sets? Two "and"
> >>>>>>> subTemplateMultiLists? Nestings thereof?)... So, we don't really
> >>>>>>> have an explosion to deal with if we do Solution 2 correctly:
> >>>>>>> andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs,
> >>>>>>> nestable, done. What can't we do with those four IEs? In this
> >>>>>>> case, we could even step back and say that semantics outside these
> >>>>>>> four are within the protocol explicitly _undefined_, and to be
> >>>>>>> interpreted withi
> >>>>>>>                
> >>> n the
> >>>        
> >>>>>>>    context of each Template.
> >>>>>>>
> >>>>>>>
> >>>>>>>                
> >>>>>> If I translate some more what I wrote "While it's solved the router
> >>>>>> and most mediation function needs today", this would be "Me,
> >>>>>> myself, and I don't need more that OR and AND now ;-)".
> >>>>>> However, who am I to tell that others don't need it now... and that
> >>>>>> the logical solution is to use the IPFIX structured data
> >>>>>> Furthermore, if we think a little bit longer term, the next big
> >>>>>> step in IPFIX is the mediation function. In my company, every
> >>>>>> features want to export his own data with NetFlow/IPFIX... up to
> >>>>>> the point where a CPE would not have enough bandwidth across the
> >>>>>> WAN to export all the "management" information. So we'll need more
> >>>>>> and more of aggregated flow records (both in time and space) even
> >>>>>> in the router. Again, the logical solution will be to use the IPFIX
> >>>>>> structured data. At this point, we will most probably need
> >>>>>> something else than OR and AND, i.e. RANGE, ORDERRED, etc...
> >>>>>>
> >>>>>> An example of subTemplateMultiLists with disjoint IE sets, let's
> >>>>>> imagine that you have to export an aggregated observation point,
> >>>>>> composed of multiple template records
> >>>>>>       template 1: exporterIPaddress
> >>>>>>       template 2: exporterIPaddress, basicList of interfaces
> >>>>>>       template 3: exporterIPaddress, LC
> >>>>>>              
> >>>>> and then you'd want to OR these... Okay, makes sense...
> >>>>>
> >>>>>            
> >>>>>>> 3. If we really _do_ want ranges and so on (which, again, we'd
> >>>>>>> need to get WG consensus on; this is explicitly out of scope in my
> >>>>>>> reading of the present charter), then we could do them in the
> >>>>>>> scope of Solution 3.
> >>>>>>>                
> >>>>>> Btw, acknowledging that one day we will have to solve this is good
> >>>>>> enough for solution 3. I mean, we don't have to populate the
> >>>>>> semantic IANA now... even if that would be more efficient.
> >>>>>>              
> >>>>>>> However, this seems a little not-quite-fleshed-out-enough for me
> >>>>>>> to say whether I like it or not. Could you present an example of
> >>>>>>> how you would use the proposed Semantic field to model your
> >>>>>>> example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
> >>>>>>>
> >>>>>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
> >>>>>>>
> >>>>>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1
> >>>>>>> eth2) (notBasicList (orBasicList ingressInterface eth3 eth4))
> >>>>>>> (andBasicList lineCardID 2))
> >>>>>>>
> >>>>>>>
> >>>>>>>                
> >>>>>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2),
> >>>>>> (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
> >>>>>>              
> >>>>> Hm. Okay. This makes sense. A couple of questions then, about
> >>>>> solution 3:
> >>>>>
> >>>>> 1. Would we want to try and bitfield this, to define the semantics
> >>>>> we _know_ we have, then leave the rest of it reserved? Something like:
> >>>>>
> >>>>> MSb                      LSb
> >>>>> +---+------+---------------+
> >>>>> | ! | multi| reserved      |
> >>>>> +---+------+---------------+
> >>>>>
> >>>>> ! = negate sense of semantics if 1 (this is the NOT flag)
> >>>>> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10:
> >>>>> xor/exactlyOne, 11: and/exactlyAll
> >>>>> reserved = place for adding other bells and whistles like RANGE and
> >>>>> so on in the future.
> >>>>>            
> >>>> Maybe we want to start by asking the question: which semantic do we
> >>>> need now?
> >>>> Is a need for NONE, OR, AND, ORDERED for now. Anything else?
> >>>>          
> >>>>> 2. Are we sure we want to do this in one byte? If we do bitfielding
> >>>>> as above this gives us 32 possible extensions, which seems like
> >>>>> _way_ more than enough, but does stick another odd offset in there,
> >>>>> which slows things down on machines that need aligned access.
> >>>>> Probably one byte is okay and we let the implementation use
> >>>>> paddingOctets and set padding to fix this...
> >>>>>            
> >>>> With structured data, is the alignment still important? When I look
> >>>> at the examples throughout the draft... As you wrote, paddingOctets
> >>>> might be the solution in this case.
> >>>>          
> >>>>> 3. Would it make sense maybe to have two sets of structured data
> >>>>> elements, one with the semantics byte, and one without (which is
> >>>>> then explicitly undefined)? Then exporters who don't need it don't
> >>>>> have to bother sticking an extra odd-aligned zero in the stream for
> >>>>> every list.
> >>>>>
> >>>>> I'm sure I'l have more questions, but none come to mind now... But
> >>>>> it seems like we're converging on the least-unnecessarily-complex
> >>>>> solution here, which is good. :)
> >>>>>            
> >>>> Happy about that. ;-)
> >>>> Note: I was envisioning even something more simpler: one byte,
> >>>> containing all the semantic possibilities, administered by IANA.  So
> >>>> no reserved, no !
> >>>>
> >>>> Regards, Benoit.
> >>>>          
> >>>        
> >> _______________________________________________
> >> IPFIX mailing list
> >> IPFIX@ietf.org
> >> https://www.ietf.org/mailman/listinfo/ipfix
> >>
> >>      
> > ---
> > Atsushi KOBAYASHI<akoba@nttv6.net>
> > NTT Information Sharing Platform Lab.
> > tel:+81-(0)422-59-3978 fax:+81-(0)422-59-5637
> >
> >    
> 

-- 
Atsushi Kobayashi <akoba@nttv6.net>