Re: [IPFIX] Semantic and structured data

Benoit Claise <bclaise@cisco.com> Thu, 18 March 2010 08:56 UTC

Return-Path: <bclaise@cisco.com>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D14D93A6867 for <ipfix@core3.amsl.com>; Thu, 18 Mar 2010 01:56:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Level:
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[AWL=-0.198, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aSlsINGhdZAy for <ipfix@core3.amsl.com>; Thu, 18 Mar 2010 01:56:04 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by core3.amsl.com (Postfix) with ESMTP id 33A7D3A6831 for <ipfix@ietf.org>; Thu, 18 Mar 2010 01:56:04 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2I8qSxH016468; Thu, 18 Mar 2010 09:52:28 +0100 (CET)
Received: from [10.55.43.57] (ams-bclaise-8718.cisco.com [10.55.43.57]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2I8qLDY004088; Thu, 18 Mar 2010 09:52:23 +0100 (CET)
Message-ID: <4BA1E9C4.9030204@cisco.com>
Date: Thu, 18 Mar 2010 09:52:20 +0100
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
To: Atsushi Kobayashi <akoba@nttv6.net>
References: <4BA0D177.1070506@net.in.tum.de> <4BA0D4CB.6000308@cisco.com> <20100318051549.AFC9.17391CF2@nttv6.net>
In-Reply-To: <20100318051549.AFC9.17391CF2@nttv6.net>
Content-Type: multipart/alternative; boundary="------------040601060409040805080109"
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Mar 2010 08:56:06 -0000

Kobayashi-san,
> Hi Benoit, and all,
>
> I agree solution3 rather than new semantic IE.
> As Benoit mentioned, semantic IE seems to increase the difficulty to
> interpret them.
>
> As Brian mentioned, I think it should avoid to have unnecessary
> complexity. But, I am not sure what type, e.g., NONE, OR, AND, NOR,
> RANGE, and ORDERRED, is needed. I would like to avoid that one data
> structure is represented in multiple ways.
So do I.
 From this, I conclude that a simple enumeration type is better compared 
to the bitvector encoding.
If we include a bitvector with one bit for NOT, we end up in this situation.

    NOT OR = NOR
    NOT AND = NAND
    NOT ORDERED = UNORDERED (*)
    NOT UNORDERED = ORDERED (*)
    NOT NONE => doesn't make sense  (**)
    NOT RANGE => doesn't make sense  (**)
    NOT NOT => doesn't make sense  (**)


(*) implies that we can represent the same semantic different ways. Now, 
we can argue that we don't need UNORDERED
(**) add non possible and confusing semantics
> To examine it, we needs more
> practical examples in addition to: Selector Report
> Interpretation and interface lists on aggregated Flow Records.
>
> How about BGP AS Path, Community? Anything else?
>
> When I have the following BGP AS Path mixing as-sequence and as-set, how
> to present it?
>
> 10 20 30 40 {50,60}
>
> (basicList, ORDERED, (basicList, ORDERED, AS10,AS20,AS30,AS40),
> (basicList, OR, AS50, AS60))
>
> Is it correct?
>    
Exactly, which implies that the ORDER semantic is required for this use 
case.

Regards, Benoit.
> Regards,
> Atsushi
>
> On Wed, 17 Mar 2010 14:10:35 +0100
> Benoit Claise<bclaise@cisco.com>  wrote:
>
>    
>> Thanks Gerhard for your feedback.
>> See inline.
>>      
>>> Hi all,
>>>
>>> Some general thoughts from my side:
>>>
>>> - I appreciate that you want to add a basic notion of semantic to the
>>>    structured data.
>>>        
>> Great.
>>      
>>> - Up to now, semantic was not in the protocol but in the info model.
>>>
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> |0|               Field ID    |       Element Length            |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> | Semantic  |             BasicList Content ...                 |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> |                           ...                                 |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>
>>>    Can't we encode Semantic in an IE?
>>>    Then, we could do without a new IANA registry.
>>>    And: This IE could be used for other purposes (e.g., in Templates)
>>>        
>> We thought about this one. However, one of the IPFIX principle is that
>> the semantic of one IE can't depend on the value or position of some
>> other IEs.
>> One example of this is the (MPLS label position, MPLS label). Way before
>> structured data, our initial implementation contained: MPLS label
>> position, MPLS label.
>> So the collector had to look first at the MPLS label position value in
>> order to correctly understand the following  MPLS label.
>> Somehow, with these conventions, we were having an information model on
>> the top of the IE. This was wrong.
>> Also it was not obvious for a collector, i.e. without hardcoding the
>> information.
>> Finally, what if the IE order changed within the flow record?
>> Conclusion: we had to change back our implementation mplslabelposition1,
>> mplslabelposition2, etc... Like we did in IPIFX ;-)
>>
>>
>>      
>>> - I'm not sure whether the bitvector encoding has an advantage over an
>>>    enumeration type.
>>>
>>> Unfortunately, I do not have the time to participate in a deep
>>> discussion - I'm busy with IPFIX-CONFIG and other stuff.
>>>        
>> Thanks and regards, Benoit.
>>      
>>> Regards,
>>> Gerhard
>>>
>>> Benoit Claise wrote:
>>>        
>>>> Hi Brian,
>>>>          
>>>>> Hi, Benoit,
>>>>>
>>>>> Replies inline...
>>>>>
>>>>> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
>>>>>
>>>>>            
>>>>>> Hi Brian,
>>>>>>
>>>>>> Thanks for your feedback.
>>>>>>              
>>>>>>> hi, Benoit, all...
>>>>>>>
>>>>>>> Agreed that we should do something about this (i.e., that solution
>>>>>>> 1 is no solution.); that said, a few comments in no particular order:
>>>>>>>
>>>>>>> 1. In considering adding explicit semantics to structured data, we
>>>>>>> as a WG are taking on the task of defining semantics for IPFIX as
>>>>>>> a whole. Semantics, as I understand them, are in IPFIX largely
>>>>>>> contextual and template dependent, but in almost all cases I can
>>>>>>> think of it seems like these are implicitly "AND" semnantics (this
>>>>>>> flow has source IP A _AND_ destination IP B _AND_...).
>>>>>>>
>>>>>>>                
>>>>>> Agreed.
>>>>>>              
>>>>>>> We will need to make an explicit statement on this.
>>>>>>>                
>>>>>> Not sure why.
>>>>>>              
>>>>>>> We will need to determine whether these implicit semantics are a
>>>>>>> property of the protocol (in which case Structured Data is really
>>>>>>> a protocol-level extension), or a property of each information
>>>>>>> element (in which case all 5103 IEs have implicit AND semantics;
>>>>>>> question 2: will we need to add semantics to the IANA registry in
>>>>>>> this case?). We will need to be quite careful about this. It's not
>>>>>>> as simple as defining semantics within structured data then
>>>>>>> calling it done.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> I'm not sure why each individual IE should have a semantic ... in
>>>>>> the IANA registry.
>>>>>>              
>>>>> I'm not either. On reflection it seems like overkill. I'm just
>>>>> saying that if we, as a WG, are moving from stating that semantics
>>>>> are explicitly out of scope to defining them as in-scope, we need to
>>>>> have a consistent approach, and answers to all the questions that
>>>>> arise when we consider moving the protocol from a simple framing
>>>>> mechanism to a framing mechanism with some logic behind it, so we
>>>>> answer them once, so that all future efforts having to do with
>>>>> semantics are consistent. This, I acknowledge, is an argument in
>>>>> favor of solution 3...)
>>>>>
>>>>> One very simple new question that arises here, to illustrate my
>>>>> point: Is it legal to export a record that has sourceIPAddress X AND
>>>>> NOT sourceIPAddress X?
>>>>>            
>>>>    From a protocol point of view, yes
>>>>   From a semantic point of view, I don't see a use case for that.
>>>> Now, this question is not different that: with RFC5101,  is it legal
>>>> to export a record that has two instances of sourceIPAddress?
>>>>          
>>>>>> As you wrote, "in almost all cases I can think of it seems like
>>>>>> these are implicitly "AND" semnantics", so the case we try to solve
>>>>>> is when there are multiple instances of a single IE. We know that
>>>>>> RFC 5101 foresaw that case " The Collector MUST support the use of
>>>>>> Templates containing multiple occurrences of the similar
>>>>>> Information Elements", but the idea is not to change RFC5101.  If
>>>>>> we put some semantic in the IPFIX structured data, that would solve
>>>>>> the vast majority of our cases. Also, we could say: if you want
>>>>>> some semantic when exporting multiple IEs, then you SHOULD use
>>>>>> IPFIX structured data.
>>>>>>
>>>>>> Also, the default value for the semantic field in the IPFIX
>>>>>> structured data SHOULD be "NONE", to express that flow record
>>>>>> doesn't include any semantic.... like in RFC5101. You might draw
>>>>>> your own conclusion, maybe because you know your network, maybe
>>>>>> because you have configured the exporter, but then it's your decision.
>>>>>> The way I see the proposed solution is: in IPFIX structured data,
>>>>>> you MAY use the semantic field as a way to express the relationship
>>>>>> between IEs within the structure.
>>>>>>              
>>>>>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid
>>>>>>> argument against Solution 2, as it's trying to solve a somewhat
>>>>>>> different and more limited problem than structured data. Solution
>>>>>>> 2 _might_ cause a problem for this draft, but certainly not the
>>>>>>> other way around (unless we as a WG want to subsume that work into
>>>>>>> this draft, which would probably require rechartering...). Also, I
>>>>>>> don't think we need semantics for all of the list types, just
>>>>>>> basicList (Illustrative question: How do I meaningfully interpret
>>>>>>> two "or" subTemplateMultiLists with disjoint IE sets? Two "and"
>>>>>>> subTemplateMultiLists? Nestings thereof?)... So, we don't really
>>>>>>> have an explosion to deal with if we do Solution 2 correctly:
>>>>>>> andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs,
>>>>>>> nestable, done. What can't we do with those four IEs? In this
>>>>>>> case, we could even step back and say that semantics outside these
>>>>>>> four are within the protocol explicitly _undefined_, and to be
>>>>>>> interpreted withi
>>>>>>>                
>>> n the
>>>        
>>>>>>>    context of each Template.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> If I translate some more what I wrote "While it's solved the router
>>>>>> and most mediation function needs today", this would be "Me,
>>>>>> myself, and I don't need more that OR and AND now ;-)".
>>>>>> However, who am I to tell that others don't need it now... and that
>>>>>> the logical solution is to use the IPFIX structured data
>>>>>> Furthermore, if we think a little bit longer term, the next big
>>>>>> step in IPFIX is the mediation function. In my company, every
>>>>>> features want to export his own data with NetFlow/IPFIX... up to
>>>>>> the point where a CPE would not have enough bandwidth across the
>>>>>> WAN to export all the "management" information. So we'll need more
>>>>>> and more of aggregated flow records (both in time and space) even
>>>>>> in the router. Again, the logical solution will be to use the IPFIX
>>>>>> structured data. At this point, we will most probably need
>>>>>> something else than OR and AND, i.e. RANGE, ORDERRED, etc...
>>>>>>
>>>>>> An example of subTemplateMultiLists with disjoint IE sets, let's
>>>>>> imagine that you have to export an aggregated observation point,
>>>>>> composed of multiple template records
>>>>>>       template 1: exporterIPaddress
>>>>>>       template 2: exporterIPaddress, basicList of interfaces
>>>>>>       template 3: exporterIPaddress, LC
>>>>>>              
>>>>> and then you'd want to OR these... Okay, makes sense...
>>>>>
>>>>>            
>>>>>>> 3. If we really _do_ want ranges and so on (which, again, we'd
>>>>>>> need to get WG consensus on; this is explicitly out of scope in my
>>>>>>> reading of the present charter), then we could do them in the
>>>>>>> scope of Solution 3.
>>>>>>>                
>>>>>> Btw, acknowledging that one day we will have to solve this is good
>>>>>> enough for solution 3. I mean, we don't have to populate the
>>>>>> semantic IANA now... even if that would be more efficient.
>>>>>>              
>>>>>>> However, this seems a little not-quite-fleshed-out-enough for me
>>>>>>> to say whether I like it or not. Could you present an example of
>>>>>>> how you would use the proposed Semantic field to model your
>>>>>>> example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
>>>>>>>
>>>>>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
>>>>>>>
>>>>>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1
>>>>>>> eth2) (notBasicList (orBasicList ingressInterface eth3 eth4))
>>>>>>> (andBasicList lineCardID 2))
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2),
>>>>>> (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
>>>>>>              
>>>>> Hm. Okay. This makes sense. A couple of questions then, about
>>>>> solution 3:
>>>>>
>>>>> 1. Would we want to try and bitfield this, to define the semantics
>>>>> we _know_ we have, then leave the rest of it reserved? Something like:
>>>>>
>>>>> MSb                      LSb
>>>>> +---+------+---------------+
>>>>> | ! | multi| reserved      |
>>>>> +---+------+---------------+
>>>>>
>>>>> ! = negate sense of semantics if 1 (this is the NOT flag)
>>>>> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10:
>>>>> xor/exactlyOne, 11: and/exactlyAll
>>>>> reserved = place for adding other bells and whistles like RANGE and
>>>>> so on in the future.
>>>>>            
>>>> Maybe we want to start by asking the question: which semantic do we
>>>> need now?
>>>> Is a need for NONE, OR, AND, ORDERED for now. Anything else?
>>>>          
>>>>> 2. Are we sure we want to do this in one byte? If we do bitfielding
>>>>> as above this gives us 32 possible extensions, which seems like
>>>>> _way_ more than enough, but does stick another odd offset in there,
>>>>> which slows things down on machines that need aligned access.
>>>>> Probably one byte is okay and we let the implementation use
>>>>> paddingOctets and set padding to fix this...
>>>>>            
>>>> With structured data, is the alignment still important? When I look
>>>> at the examples throughout the draft... As you wrote, paddingOctets
>>>> might be the solution in this case.
>>>>          
>>>>> 3. Would it make sense maybe to have two sets of structured data
>>>>> elements, one with the semantics byte, and one without (which is
>>>>> then explicitly undefined)? Then exporters who don't need it don't
>>>>> have to bother sticking an extra odd-aligned zero in the stream for
>>>>> every list.
>>>>>
>>>>> I'm sure I'l have more questions, but none come to mind now... But
>>>>> it seems like we're converging on the least-unnecessarily-complex
>>>>> solution here, which is good. :)
>>>>>            
>>>> Happy about that. ;-)
>>>> Note: I was envisioning even something more simpler: one byte,
>>>> containing all the semantic possibilities, administered by IANA.  So
>>>> no reserved, no !
>>>>
>>>> Regards, Benoit.
>>>>          
>>>        
>> _______________________________________________
>> IPFIX mailing list
>> IPFIX@ietf.org
>> https://www.ietf.org/mailman/listinfo/ipfix
>>
>>      
> ---
> Atsushi KOBAYASHI<akoba@nttv6.net>
> NTT Information Sharing Platform Lab.
> tel:+81-(0)422-59-3978 fax:+81-(0)422-59-5637
>
>