Re: [IPFIX] Semantic and structured data

Gerhard Muenz <muenz@net.in.tum.de> Wed, 17 March 2010 12:55 UTC

Return-Path: <muenz@net.in.tum.de>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5F71D3A6A3B for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 05:55:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.57
X-Spam-Level:
X-Spam-Status: No, score=-1.57 tagged_above=-999 required=5 tests=[AWL=-0.451, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13, HELO_EQ_DE=0.35]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id siQc1v7oUiIB for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 05:55:50 -0700 (PDT)
Received: from mail-out1.informatik.tu-muenchen.de (mail-out1.informatik.tu-muenchen.de [131.159.0.8]) by core3.amsl.com (Postfix) with ESMTP id C15773A6A22 for <ipfix@ietf.org>; Wed, 17 Mar 2010 05:55:49 -0700 (PDT)
Received: from [131.159.20.108] (repulse.net.in.tum.de [131.159.20.108]) by mail.net.in.tum.de (Postfix) with ESMTPSA id 07BE82010C77; Wed, 17 Mar 2010 13:55:58 +0100 (CET)
Message-ID: <4BA0D177.1070506@net.in.tum.de>
Date: Wed, 17 Mar 2010 13:56:23 +0100
From: Gerhard Muenz <muenz@net.in.tum.de>
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: Benoit Claise <bclaise@cisco.com>
References: <4AF73525.8050009@net.in.tum.de> <4AF8F999.3000207@cisco.com> <F60CA342-F488-4179-8AB8-079D32D26BCD@tik.ee.ethz.ch> <4B9E33BE.7070907@cisco.com> <7A1F2B11-B407-4BAF-8481-F867CCDEF5FC@tik.ee.ethz.ch> <4B9F520E.8060507@cisco.com> <DB6E59D9-B373-4919-BB58-00EB26014564@tik.ee.ethz.ch> <4BA0C26D.9070901@cisco.com>
In-Reply-To: <4BA0C26D.9070901@cisco.com>
Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg="sha1"; boundary="------------ms050602000205040806020403"
X-Virus-Scanned: ClamAV using ClamSMTP
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2010 12:55:51 -0000

Hi all,

Some general thoughts from my side:

- I appreciate that you want to add a basic notion of semantic to the
   structured data.

- Up to now, semantic was not in the protocol but in the info model.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|               Field ID    |       Element Length            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Semantic  |             BasicList Content ...                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           ...                                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Can't we encode Semantic in an IE?
   Then, we could do without a new IANA registry.
   And: This IE could be used for other purposes (e.g., in Templates)

- I'm not sure whether the bitvector encoding has an advantage over an
   enumeration type.

Unfortunately, I do not have the time to participate in a deep 
discussion - I'm busy with IPFIX-CONFIG and other stuff.

Regards,
Gerhard

Benoit Claise wrote:
> Hi Brian,
>> Hi, Benoit,
>>
>> Replies inline...
>>
>> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
>>
>>    
>>> Hi Brian,
>>>
>>> Thanks for your feedback.
>>>      
>>>> hi, Benoit, all...
>>>>
>>>> Agreed that we should do something about this (i.e., that solution 1 is no solution.); that said, a few comments in no particular order:
>>>>
>>>> 1. In considering adding explicit semantics to structured data, we as a WG are taking on the task of defining semantics for IPFIX as a whole. Semantics, as I understand them, are in IPFIX largely contextual and template dependent, but in almost all cases I can think of it seems like these are implicitly "AND" semnantics (this flow has source IP A _AND_ destination IP B _AND_...).
>>>>
>>>>        
>>> Agreed.
>>>      
>>>> We will need to make an explicit statement on this.
>>>>        
>>> Not sure why.
>>>      
>>>> We will need to determine whether these implicit semantics are a property of the protocol (in which case Structured Data is really a protocol-level extension), or a property of each information element (in which case all 5103 IEs have implicit AND semantics; question 2: will we need to add semantics to the IANA registry in this case?). We will need to be quite careful about this. It's not as simple as defining semantics within structured data then calling it done.
>>>>
>>>>
>>>>        
>>> I'm not sure why each individual IE should have a semantic ... in the IANA registry.
>>>      
>> I'm not either. On reflection it seems like overkill. I'm just saying that if we, as a WG, are moving from stating that semantics are explicitly out of scope to defining them as in-scope, we need to have a consistent approach, and answers to all the questions that arise when we consider moving the protocol from a simple framing mechanism to a framing mechanism with some logic behind it, so we answer them once, so that all future efforts having to do with semantics are consistent. This, I acknowledge, is an argument in favor of solution 3...)
>>
>> One very simple new question that arises here, to illustrate my point: Is it legal to export a record that has sourceIPAddress X AND NOT sourceIPAddress X?
>>    
>  From a protocol point of view, yes
>  From a semantic point of view, I don't see a use case for that.
> Now, this question is not different that: with RFC5101,  is it legal to 
> export a record that has two instances of sourceIPAddress?
>>    
>>> As you wrote, "in almost all cases I can think of it seems like these are implicitly "AND" semnantics", so the case we try to solve is when there are multiple instances of a single IE. We know that RFC 5101 foresaw that case " The Collector MUST support the use of Templates containing multiple occurrences of the similar Information Elements", but the idea is not to change RFC5101.  If we put some semantic in the IPFIX structured data, that would solve the vast majority of our cases. Also, we could say: if you want some semantic when exporting multiple IEs, then you SHOULD use IPFIX structured data.
>>>
>>> Also, the default value for the semantic field in the IPFIX structured data SHOULD be "NONE", to express that flow record doesn't include any semantic.... like in RFC5101. You might draw your own conclusion, maybe because you know your network, maybe because you have configured the exporter, but then it's your decision.
>>> The way I see the proposed solution is: in IPFIX structured data, you MAY use the semantic field as a way to express the relationship between IEs within the structure.
>>>      
>>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid argument against Solution 2, as it's trying to solve a somewhat different and more limited problem than structured data. Solution 2 _might_ cause a problem for this draft, but certainly not the other way around (unless we as a WG want to subsume that work into this draft, which would probably require rechartering...). Also, I don't think we need semantics for all of the list types, just basicList (Illustrative question: How do I meaningfully interpret two "or" subTemplateMultiLists with disjoint IE sets? Two "and" subTemplateMultiLists? Nestings thereof?)... So, we don't really have an explosion to deal with if we do Solution 2 correctly: andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs, nestable, done. What can't we do with those four IEs? In this case, we could even step back and say that semantics outside these four are within the protocol explicitly _undefined_, and to be interpreted withi
n the
>>>>   context of each Template.
>>>>
>>>>
>>>>        
>>> If I translate some more what I wrote "While it's solved the router and most mediation function needs today", this would be "Me, myself, and I don't need more that OR and AND now ;-)".
>>> However, who am I to tell that others don't need it now... and that the logical solution is to use the IPFIX structured data
>>> Furthermore, if we think a little bit longer term, the next big step in IPFIX is the mediation function. In my company, every features want to export his own data with NetFlow/IPFIX... up to the point where a CPE would not have enough bandwidth across the WAN to export all the "management" information. So we'll need more and more of aggregated flow records (both in time and space) even in the router. Again, the logical solution will be to use the IPFIX structured data. At this point, we will most probably need something else than OR and AND, i.e. RANGE, ORDERRED, etc...
>>>
>>> An example of subTemplateMultiLists with disjoint IE sets, let's imagine that you have to export an aggregated observation point, composed of multiple template records
>>>      template 1: exporterIPaddress
>>>      template 2: exporterIPaddress, basicList of interfaces
>>>      template 3: exporterIPaddress, LC
>>>      
>> and then you'd want to OR these... Okay, makes sense...
>>
>>    
>>>      
>>>> 3. If we really _do_ want ranges and so on (which, again, we'd need to get WG consensus on; this is explicitly out of scope in my reading of the present charter), then we could do them in the scope of Solution 3.
>>>>        
>>> Btw, acknowledging that one day we will have to solve this is good enough for solution 3. I mean, we don't have to populate the semantic IANA now... even if that would be more efficient.
>>>      
>>>> However, this seems a little not-quite-fleshed-out-enough for me to say whether I like it or not. Could you present an example of how you would use the proposed Semantic field to model your example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
>>>>
>>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
>>>>
>>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1 eth2) (notBasicList (orBasicList ingressInterface eth3 eth4)) (andBasicList lineCardID 2))
>>>>
>>>>
>>>>        
>>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2), (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
>>>      
>> Hm. Okay. This makes sense. A couple of questions then, about solution 3:
>>
>> 1. Would we want to try and bitfield this, to define the semantics we _know_ we have, then leave the rest of it reserved? Something like:
>>
>> MSb                      LSb
>> +---+------+---------------+
>> | ! | multi| reserved      |
>> +---+------+---------------+
>>
>> ! = negate sense of semantics if 1 (this is the NOT flag)
>> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10: xor/exactlyOne, 11: and/exactlyAll
>> reserved = place for adding other bells and whistles like RANGE and so on in the future.
>>    
> Maybe we want to start by asking the question: which semantic do we need 
> now?
> Is a need for NONE, OR, AND, ORDERED for now. Anything else?
>> 2. Are we sure we want to do this in one byte? If we do bitfielding as above this gives us 32 possible extensions, which seems like _way_ more than enough, but does stick another odd offset in there, which slows things down on machines that need aligned access. Probably one byte is okay and we let the implementation use paddingOctets and set padding to fix this...
>>    
> With structured data, is the alignment still important? When I look at 
> the examples throughout the draft... As you wrote, paddingOctets might 
> be the solution in this case.
>> 3. Would it make sense maybe to have two sets of structured data elements, one with the semantics byte, and one without (which is then explicitly undefined)? Then exporters who don't need it don't have to bother sticking an extra odd-aligned zero in the stream for every list.
>>
>> I'm sure I'l have more questions, but none come to mind now... But it seems like we're converging on the least-unnecessarily-complex solution here, which is good. :)
>>    
> Happy about that. ;-)
> Note: I was envisioning even something more simpler: one byte, 
> containing all the semantic possibilities, administered by IANA.  So no 
> reserved, no !
> 
> Regards, Benoit.