Re: [IPFIX] Semantic and structured data

Benoit Claise <bclaise@cisco.com> Wed, 17 March 2010 13:14 UTC

Return-Path: <bclaise@cisco.com>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CBD173A6A3F for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 06:14:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.695
X-Spam-Level:
X-Spam-Status: No, score=-1.695 tagged_above=-999 required=5 tests=[AWL=-0.226, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pAsjX650dubm for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 06:14:00 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by core3.amsl.com (Postfix) with ESMTP id 555A93A6C3C for <ipfix@ietf.org>; Wed, 17 Mar 2010 06:10:28 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2HDAaL4011419; Wed, 17 Mar 2010 14:10:36 +0100 (CET)
Received: from [10.55.43.57] (ams-bclaise-8718.cisco.com [10.55.43.57]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2HDAZQh001160; Wed, 17 Mar 2010 14:10:36 +0100 (CET)
Message-ID: <4BA0D4CB.6000308@cisco.com>
Date: Wed, 17 Mar 2010 14:10:35 +0100
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
To: Gerhard Muenz <muenz@net.in.tum.de>
References: <4AF73525.8050009@net.in.tum.de> <4AF8F999.3000207@cisco.com> <F60CA342-F488-4179-8AB8-079D32D26BCD@tik.ee.ethz.ch> <4B9E33BE.7070907@cisco.com> <7A1F2B11-B407-4BAF-8481-F867CCDEF5FC@tik.ee.ethz.ch> <4B9F520E.8060507@cisco.com> <DB6E59D9-B373-4919-BB58-00EB26014564@tik.ee.ethz.ch> <4BA0C26D.9070901@cisco.com> <4BA0D177.1070506@net.in.tum.de>
In-Reply-To: <4BA0D177.1070506@net.in.tum.de>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2010 13:14:01 -0000

Thanks Gerhard for your feedback.
See inline.
>
> Hi all,
>
> Some general thoughts from my side:
>
> - I appreciate that you want to add a basic notion of semantic to the
>   structured data.
Great.
>
> - Up to now, semantic was not in the protocol but in the info model.
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |0|               Field ID    |       Element Length            |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Semantic  |             BasicList Content ...                 |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |                           ...                                 |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>   Can't we encode Semantic in an IE?
>   Then, we could do without a new IANA registry.
>   And: This IE could be used for other purposes (e.g., in Templates)
We thought about this one. However, one of the IPFIX principle is that 
the semantic of one IE can't depend on the value or position of some 
other IEs.
One example of this is the (MPLS label position, MPLS label). Way before 
structured data, our initial implementation contained: MPLS label 
position, MPLS label.
So the collector had to look first at the MPLS label position value in 
order to correctly understand the following  MPLS label.
Somehow, with these conventions, we were having an information model on 
the top of the IE. This was wrong.
Also it was not obvious for a collector, i.e. without hardcoding the 
information.
Finally, what if the IE order changed within the flow record?
Conclusion: we had to change back our implementation mplslabelposition1, 
mplslabelposition2, etc... Like we did in IPIFX ;-)


>
> - I'm not sure whether the bitvector encoding has an advantage over an
>   enumeration type.
>
> Unfortunately, I do not have the time to participate in a deep 
> discussion - I'm busy with IPFIX-CONFIG and other stuff.
Thanks and regards, Benoit.
>
> Regards,
> Gerhard
>
> Benoit Claise wrote:
>> Hi Brian,
>>> Hi, Benoit,
>>>
>>> Replies inline...
>>>
>>> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
>>>
>>>> Hi Brian,
>>>>
>>>> Thanks for your feedback.
>>>>> hi, Benoit, all...
>>>>>
>>>>> Agreed that we should do something about this (i.e., that solution 
>>>>> 1 is no solution.); that said, a few comments in no particular order:
>>>>>
>>>>> 1. In considering adding explicit semantics to structured data, we 
>>>>> as a WG are taking on the task of defining semantics for IPFIX as 
>>>>> a whole. Semantics, as I understand them, are in IPFIX largely 
>>>>> contextual and template dependent, but in almost all cases I can 
>>>>> think of it seems like these are implicitly "AND" semnantics (this 
>>>>> flow has source IP A _AND_ destination IP B _AND_...).
>>>>>
>>>> Agreed.
>>>>> We will need to make an explicit statement on this.
>>>> Not sure why.
>>>>> We will need to determine whether these implicit semantics are a 
>>>>> property of the protocol (in which case Structured Data is really 
>>>>> a protocol-level extension), or a property of each information 
>>>>> element (in which case all 5103 IEs have implicit AND semantics; 
>>>>> question 2: will we need to add semantics to the IANA registry in 
>>>>> this case?). We will need to be quite careful about this. It's not 
>>>>> as simple as defining semantics within structured data then 
>>>>> calling it done.
>>>>>
>>>>>
>>>> I'm not sure why each individual IE should have a semantic ... in 
>>>> the IANA registry.
>>> I'm not either. On reflection it seems like overkill. I'm just 
>>> saying that if we, as a WG, are moving from stating that semantics 
>>> are explicitly out of scope to defining them as in-scope, we need to 
>>> have a consistent approach, and answers to all the questions that 
>>> arise when we consider moving the protocol from a simple framing 
>>> mechanism to a framing mechanism with some logic behind it, so we 
>>> answer them once, so that all future efforts having to do with 
>>> semantics are consistent. This, I acknowledge, is an argument in 
>>> favor of solution 3...)
>>>
>>> One very simple new question that arises here, to illustrate my 
>>> point: Is it legal to export a record that has sourceIPAddress X AND 
>>> NOT sourceIPAddress X?
>>  From a protocol point of view, yes
>>  From a semantic point of view, I don't see a use case for that.
>> Now, this question is not different that: with RFC5101,  is it legal 
>> to export a record that has two instances of sourceIPAddress?
>>>> As you wrote, "in almost all cases I can think of it seems like 
>>>> these are implicitly "AND" semnantics", so the case we try to solve 
>>>> is when there are multiple instances of a single IE. We know that 
>>>> RFC 5101 foresaw that case " The Collector MUST support the use of 
>>>> Templates containing multiple occurrences of the similar 
>>>> Information Elements", but the idea is not to change RFC5101.  If 
>>>> we put some semantic in the IPFIX structured data, that would solve 
>>>> the vast majority of our cases. Also, we could say: if you want 
>>>> some semantic when exporting multiple IEs, then you SHOULD use 
>>>> IPFIX structured data.
>>>>
>>>> Also, the default value for the semantic field in the IPFIX 
>>>> structured data SHOULD be "NONE", to express that flow record 
>>>> doesn't include any semantic.... like in RFC5101. You might draw 
>>>> your own conclusion, maybe because you know your network, maybe 
>>>> because you have configured the exporter, but then it's your decision.
>>>> The way I see the proposed solution is: in IPFIX structured data, 
>>>> you MAY use the semantic field as a way to express the relationship 
>>>> between IEs within the structure.
>>>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid 
>>>>> argument against Solution 2, as it's trying to solve a somewhat 
>>>>> different and more limited problem than structured data. Solution 
>>>>> 2 _might_ cause a problem for this draft, but certainly not the 
>>>>> other way around (unless we as a WG want to subsume that work into 
>>>>> this draft, which would probably require rechartering...). Also, I 
>>>>> don't think we need semantics for all of the list types, just 
>>>>> basicList (Illustrative question: How do I meaningfully interpret 
>>>>> two "or" subTemplateMultiLists with disjoint IE sets? Two "and" 
>>>>> subTemplateMultiLists? Nestings thereof?)... So, we don't really 
>>>>> have an explosion to deal with if we do Solution 2 correctly: 
>>>>> andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs, 
>>>>> nestable, done. What can't we do with those four IEs? In this 
>>>>> case, we could even step back and say that semantics outside these 
>>>>> four are within the protocol explicitly _undefined_, and to be 
>>>>> interpreted withi
> n the
>>>>>   context of each Template.
>>>>>
>>>>>
>>>> If I translate some more what I wrote "While it's solved the router 
>>>> and most mediation function needs today", this would be "Me, 
>>>> myself, and I don't need more that OR and AND now ;-)".
>>>> However, who am I to tell that others don't need it now... and that 
>>>> the logical solution is to use the IPFIX structured data
>>>> Furthermore, if we think a little bit longer term, the next big 
>>>> step in IPFIX is the mediation function. In my company, every 
>>>> features want to export his own data with NetFlow/IPFIX... up to 
>>>> the point where a CPE would not have enough bandwidth across the 
>>>> WAN to export all the "management" information. So we'll need more 
>>>> and more of aggregated flow records (both in time and space) even 
>>>> in the router. Again, the logical solution will be to use the IPFIX 
>>>> structured data. At this point, we will most probably need 
>>>> something else than OR and AND, i.e. RANGE, ORDERRED, etc...
>>>>
>>>> An example of subTemplateMultiLists with disjoint IE sets, let's 
>>>> imagine that you have to export an aggregated observation point, 
>>>> composed of multiple template records
>>>>      template 1: exporterIPaddress
>>>>      template 2: exporterIPaddress, basicList of interfaces
>>>>      template 3: exporterIPaddress, LC
>>> and then you'd want to OR these... Okay, makes sense...
>>>
>>>>> 3. If we really _do_ want ranges and so on (which, again, we'd 
>>>>> need to get WG consensus on; this is explicitly out of scope in my 
>>>>> reading of the present charter), then we could do them in the 
>>>>> scope of Solution 3.
>>>> Btw, acknowledging that one day we will have to solve this is good 
>>>> enough for solution 3. I mean, we don't have to populate the 
>>>> semantic IANA now... even if that would be more efficient.
>>>>> However, this seems a little not-quite-fleshed-out-enough for me 
>>>>> to say whether I like it or not. Could you present an example of 
>>>>> how you would use the proposed Semantic field to model your 
>>>>> example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
>>>>>
>>>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
>>>>>
>>>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1 
>>>>> eth2) (notBasicList (orBasicList ingressInterface eth3 eth4)) 
>>>>> (andBasicList lineCardID 2))
>>>>>
>>>>>
>>>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2), 
>>>> (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
>>> Hm. Okay. This makes sense. A couple of questions then, about 
>>> solution 3:
>>>
>>> 1. Would we want to try and bitfield this, to define the semantics 
>>> we _know_ we have, then leave the rest of it reserved? Something like:
>>>
>>> MSb                      LSb
>>> +---+------+---------------+
>>> | ! | multi| reserved      |
>>> +---+------+---------------+
>>>
>>> ! = negate sense of semantics if 1 (this is the NOT flag)
>>> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10: 
>>> xor/exactlyOne, 11: and/exactlyAll
>>> reserved = place for adding other bells and whistles like RANGE and 
>>> so on in the future.
>> Maybe we want to start by asking the question: which semantic do we 
>> need now?
>> Is a need for NONE, OR, AND, ORDERED for now. Anything else?
>>> 2. Are we sure we want to do this in one byte? If we do bitfielding 
>>> as above this gives us 32 possible extensions, which seems like 
>>> _way_ more than enough, but does stick another odd offset in there, 
>>> which slows things down on machines that need aligned access. 
>>> Probably one byte is okay and we let the implementation use 
>>> paddingOctets and set padding to fix this...
>> With structured data, is the alignment still important? When I look 
>> at the examples throughout the draft... As you wrote, paddingOctets 
>> might be the solution in this case.
>>> 3. Would it make sense maybe to have two sets of structured data 
>>> elements, one with the semantics byte, and one without (which is 
>>> then explicitly undefined)? Then exporters who don't need it don't 
>>> have to bother sticking an extra odd-aligned zero in the stream for 
>>> every list.
>>>
>>> I'm sure I'l have more questions, but none come to mind now... But 
>>> it seems like we're converging on the least-unnecessarily-complex 
>>> solution here, which is good. :)
>> Happy about that. ;-)
>> Note: I was envisioning even something more simpler: one byte, 
>> containing all the semantic possibilities, administered by IANA.  So 
>> no reserved, no !
>>
>> Regards, Benoit.
>