Re: [IPFIX] Semantic and structured data

Benoit Claise <bclaise@cisco.com> Wed, 17 March 2010 12:29 UTC

Return-Path: <bclaise@cisco.com>
X-Original-To: ipfix@core3.amsl.com
Delivered-To: ipfix@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 537643A6927 for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 05:29:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.712
X-Spam-Level:
X-Spam-Status: No, score=-1.712 tagged_above=-999 required=5 tests=[AWL=-0.243, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id krb3y4OmCHuz for <ipfix@core3.amsl.com>; Wed, 17 Mar 2010 05:29:09 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by core3.amsl.com (Postfix) with ESMTP id 4D9553A68FA for <ipfix@ietf.org>; Wed, 17 Mar 2010 05:29:09 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2HBqHlp007471; Wed, 17 Mar 2010 12:52:17 +0100 (CET)
Received: from [10.55.43.57] (ams-bclaise-8718.cisco.com [10.55.43.57]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id o2HBqDUq018129; Wed, 17 Mar 2010 12:52:13 +0100 (CET)
Message-ID: <4BA0C26D.9070901@cisco.com>
Date: Wed, 17 Mar 2010 12:52:13 +0100
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
To: Brian Trammell <trammell@tik.ee.ethz.ch>
References: <4AF73525.8050009@net.in.tum.de> <4AF8F999.3000207@cisco.com> <F60CA342-F488-4179-8AB8-079D32D26BCD@tik.ee.ethz.ch> <4B9E33BE.7070907@cisco.com> <7A1F2B11-B407-4BAF-8481-F867CCDEF5FC@tik.ee.ethz.ch> <4B9F520E.8060507@cisco.com> <DB6E59D9-B373-4919-BB58-00EB26014564@tik.ee.ethz.ch>
In-Reply-To: <DB6E59D9-B373-4919-BB58-00EB26014564@tik.ee.ethz.ch>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Cc: ipfix@ietf.org
Subject: Re: [IPFIX] Semantic and structured data
X-BeenThere: ipfix@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IPFIX WG discussion list <ipfix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipfix>
List-Post: <mailto:ipfix@ietf.org>
List-Help: <mailto:ipfix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipfix>, <mailto:ipfix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2010 12:29:11 -0000

Hi Brian,
> Hi, Benoit,
>
> Replies inline...
>
> On Mar 16, 2010, at 2:40 AM, Benoit Claise wrote:
>
>    
>> Hi Brian,
>>
>> Thanks for your feedback.
>>      
>>> hi, Benoit, all...
>>>
>>> Agreed that we should do something about this (i.e., that solution 1 is no solution.); that said, a few comments in no particular order:
>>>
>>> 1. In considering adding explicit semantics to structured data, we as a WG are taking on the task of defining semantics for IPFIX as a whole. Semantics, as I understand them, are in IPFIX largely contextual and template dependent, but in almost all cases I can think of it seems like these are implicitly "AND" semnantics (this flow has source IP A _AND_ destination IP B _AND_...).
>>>
>>>        
>> Agreed.
>>      
>>> We will need to make an explicit statement on this.
>>>        
>> Not sure why.
>>      
>>> We will need to determine whether these implicit semantics are a property of the protocol (in which case Structured Data is really a protocol-level extension), or a property of each information element (in which case all 5103 IEs have implicit AND semantics; question 2: will we need to add semantics to the IANA registry in this case?). We will need to be quite careful about this. It's not as simple as defining semantics within structured data then calling it done.
>>>
>>>
>>>        
>> I'm not sure why each individual IE should have a semantic ... in the IANA registry.
>>      
> I'm not either. On reflection it seems like overkill. I'm just saying that if we, as a WG, are moving from stating that semantics are explicitly out of scope to defining them as in-scope, we need to have a consistent approach, and answers to all the questions that arise when we consider moving the protocol from a simple framing mechanism to a framing mechanism with some logic behind it, so we answer them once, so that all future efforts having to do with semantics are consistent. This, I acknowledge, is an argument in favor of solution 3...)
>
> One very simple new question that arises here, to illustrate my point: Is it legal to export a record that has sourceIPAddress X AND NOT sourceIPAddress X?
>    
 From a protocol point of view, yes
 From a semantic point of view, I don't see a use case for that.
Now, this question is not different that: with RFC5101,  is it legal to 
export a record that has two instances of sourceIPAddress?
>    
>> As you wrote, "in almost all cases I can think of it seems like these are implicitly "AND" semnantics", so the case we try to solve is when there are multiple instances of a single IE. We know that RFC 5101 foresaw that case " The Collector MUST support the use of Templates containing multiple occurrences of the similar Information Elements", but the idea is not to change RFC5101.  If we put some semantic in the IPFIX structured data, that would solve the vast majority of our cases. Also, we could say: if you want some semantic when exporting multiple IEs, then you SHOULD use IPFIX structured data.
>>
>> Also, the default value for the semantic field in the IPFIX structured data SHOULD be "NONE", to express that flow record doesn't include any semantic.... like in RFC5101. You might draw your own conclusion, maybe because you know your network, maybe because you have configured the exporter, but then it's your decision.
>> The way I see the proposed solution is: in IPFIX structured data, you MAY use the semantic field as a way to express the relationship between IEs within the structure.
>>      
>>> 2. I don't consider draft-sommer-ipfix-mediator-ext-01 a valid argument against Solution 2, as it's trying to solve a somewhat different and more limited problem than structured data. Solution 2 _might_ cause a problem for this draft, but certainly not the other way around (unless we as a WG want to subsume that work into this draft, which would probably require rechartering...). Also, I don't think we need semantics for all of the list types, just basicList (Illustrative question: How do I meaningfully interpret two "or" subTemplateMultiLists with disjoint IE sets? Two "and" subTemplateMultiLists? Nestings thereof?)... So, we don't really have an explosion to deal with if we do Solution 2 correctly: andBasicList, orBasicList, xorBasicList, notBasicList; Four IEs, nestable, done. What can't we do with those four IEs? In this case, we could even step back and say that semantics outside these four are within the protocol explicitly _undefined_, and to be interpreted within the
>>>   context of each Template.
>>>
>>>
>>>        
>> If I translate some more what I wrote "While it's solved the router and most mediation function needs today", this would be "Me, myself, and I don't need more that OR and AND now ;-)".
>> However, who am I to tell that others don't need it now... and that the logical solution is to use the IPFIX structured data
>> Furthermore, if we think a little bit longer term, the next big step in IPFIX is the mediation function. In my company, every features want to export his own data with NetFlow/IPFIX... up to the point where a CPE would not have enough bandwidth across the WAN to export all the "management" information. So we'll need more and more of aggregated flow records (both in time and space) even in the router. Again, the logical solution will be to use the IPFIX structured data. At this point, we will most probably need something else than OR and AND, i.e. RANGE, ORDERRED, etc...
>>
>> An example of subTemplateMultiLists with disjoint IE sets, let's imagine that you have to export an aggregated observation point, composed of multiple template records
>>      template 1: exporterIPaddress
>>      template 2: exporterIPaddress, basicList of interfaces
>>      template 3: exporterIPaddress, LC
>>      
> and then you'd want to OR these... Okay, makes sense...
>
>    
>>      
>>> 3. If we really _do_ want ranges and so on (which, again, we'd need to get WG consensus on; this is explicitly out of scope in my reading of the present charter), then we could do them in the scope of Solution 3.
>>>        
>> Btw, acknowledging that one day we will have to solve this is good enough for solution 3. I mean, we don't have to populate the semantic IANA now... even if that would be more efficient.
>>      
>>> However, this seems a little not-quite-fleshed-out-enough for me to say whether I like it or not. Could you present an example of how you would use the proposed Semantic field to model your example? "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2"
>>>
>>> (FWIW, I would do this with my proposal to Solution 2 as follows:)
>>>
>>> (orBasicList (andBasicList (orBasicList ingressInterface eth1 eth2) (notBasicList (orBasicList ingressInterface eth3 eth4)) (andBasicList lineCardID 2))
>>>
>>>
>>>        
>> (BasicList, OR, (basicList, AND, (basicList, OR, eth1, eth2), (basicList, NOR, eth3, eth4)), (basicList, NONE, linecard2)))
>>      
> Hm. Okay. This makes sense. A couple of questions then, about solution 3:
>
> 1. Would we want to try and bitfield this, to define the semantics we _know_ we have, then leave the rest of it reserved? Something like:
>
> MSb                      LSb
> +---+------+---------------+
> | ! | multi| reserved      |
> +---+------+---------------+
>
> ! = negate sense of semantics if 1 (this is the NOT flag)
> multi (multiplicity) = 00: undefined, 01: or/oneOrMore, 10: xor/exactlyOne, 11: and/exactlyAll
> reserved = place for adding other bells and whistles like RANGE and so on in the future.
>    
Maybe we want to start by asking the question: which semantic do we need 
now?
Is a need for NONE, OR, AND, ORDERED for now. Anything else?
> 2. Are we sure we want to do this in one byte? If we do bitfielding as above this gives us 32 possible extensions, which seems like _way_ more than enough, but does stick another odd offset in there, which slows things down on machines that need aligned access. Probably one byte is okay and we let the implementation use paddingOctets and set padding to fix this...
>    
With structured data, is the alignment still important? When I look at 
the examples throughout the draft... As you wrote, paddingOctets might 
be the solution in this case.
> 3. Would it make sense maybe to have two sets of structured data elements, one with the semantics byte, and one without (which is then explicitly undefined)? Then exporters who don't need it don't have to bother sticking an extra odd-aligned zero in the stream for every list.
>
> I'm sure I'l have more questions, but none come to mind now... But it seems like we're converging on the least-unnecessarily-complex solution here, which is good. :)
>    
Happy about that. ;-)
Note: I was envisioning even something more simpler: one byte, 
containing all the semantic possibilities, administered by IANA.  So no 
reserved, no !

Regards, Benoit.
> Best regards,
>
> Brian
>
>    
>>> On Mar 15, 2010, at 6:18 AM, Benoit Claise wrote:
>>>
>>>
>>>
>>>        
>>>> Dear all,
>>>>
>>>> We've been thinking about this one, and we see 3 solutions. We believe that the solution 3 is the way to go, as we show below.
>>>>
>>>> Solution 1.
>>>> Consider that the semantic is out of scope for this document.
>>>> This is the easiest solution.  However, we understand it's not right: we would not do a complete job wrt to IPFIX structured data
>>>> As Gerhard was expressing, the collector has not clue how to treat the example of a BasicList of egress interfaces in a Flow Record.
>>>>      - Has every counted packet been sent on every egress interface?
>>>>        =>  multicast case, AND semantic
>>>>      - Has every counted packet been sent on any one of the egress interfaces?
>>>>       =>  load balancing case, OR semantic
>>>>
>>>> Soltution 2
>>>> We could focus on the logical AND and OR semantics only , by defining semantic lists, such as andBasicList, andSubTemplateList, andSubTemplateMultiList, orBasicList, orSubTemplateList, and orSubTemplateMultiList
>>>> So 6 I.E.s in total, describing AND and OR semantics.
>>>> While it's solved the router and most mediation function needs today, we understand that this is not a complete solution.
>>>> Gerhard, in one of his old draft, draft-sommer-ipfix-mediator-ext-01, proposed ADTs for orderedList, orderedPair, and portRanges which allow the definition new IEs for port ranges etc.
>>>> Even if this solution 2 is extensible, in the long term, will lead to an explosion of IEs: 3 list types * the semantic type, where the semantic can be (and, or, orderedList, orderedPair, portRanges, random, etc...)
>>>>
>>>> Soltution 3
>>>> We propose to add a semantic field in the 3 list types.
>>>> Something such as:
>>>>
>>>>      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>> |0|               Field ID    |       Element Length            |
>>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>> | Semantic  |             BasicList Content ...                 |
>>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>> |                           ...                                 |
>>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>>
>>>> This semantic field would be a new IANA registry, we could be populated initially with NONE, OR, AND, ORDERED, etc... Up to discussion.
>>>> The advantage of this solution is that it's extensible, and it doesn't need new IEs for each semantic.
>>>> If ever required (mainly in a mediation function), we can model  very complex semantic, eg, "(eth1 OR eth2) AND (NOT (eth3 OR eth4)) OR linecard2", which could be the new observation point from a aggregated Flow Record
>>>>
>>>>
>>>>
>>>> Conclusion:
>>>> After much debate internally, we believe that we should do the effort to include this semantic now, and not postpone the problem, which will lead to an explosion of IEs in the future.
>>>> Personally, I was initially against the solution 3 ... mostly due to the effort required to modify the complete specs.
>>>> We're now ready to modify the specifications in the IPFIX structured data, but we would like to get your feedback and agreement in advance as this is not a small piece of work.
>>>>
>>>> Please comment.
>>>>
>>>> Regards, Paul, Stan, Gowri, and Benoit.
>>>>
>>>>
>>>>          
>>>>> hi Benoit, Gerhard,
>>>>>
>>>>> In this case I'm strongly in favor of leaving semantics out of structured data. Structured data defines containers. The semantics of the containers as a whole and the elements change based upon the information elements within the container and within the record containing it. Wedging semantics into the structured data elements 1. risks further proliferation of (potentially non-interoperable) ways to represent the same thing, 2. gives us more ways to represent nonsensical things (an OR basicList of MPLS stack entries...means...what?), and 3. risks defining an inadequate semantic representation mechanism (what about semantics for records not using structured data? what about ordered versus unordered sets? what about OR basicList vs AND basicList vs nested AND and OR basicLists vs just three identical IEs?). If we really want an unambiguous semantic framework for IPFIX (and here I'm not convinced either way) that's best done on its own, addressing things at the information elem
>>>>> ent and record level. Doing it here confuses the issue.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Brian
>>>>>
>>>>> On Nov 10, 2009, at 6:26 AM, Benoit Claise wrote:
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Gerhard,
>>>>>>
>>>>>> Thanks for your email.
>>>>>> I have no strong feelings about the two solutions you proposed.
>>>>>>
>>>>>> > From a pure router point of view,  I don't see any use cases for logical OR in exporting flow records.
>>>>>> However, from a IPFIX Mediator point of view, I see some use cases.
>>>>>> I mean that it requires an Intermediate Aggregation Process or Intermediate Correlation Process to express: I've seen this flow record OR that flow record.
>>>>>> Now, it's true that even routers will have mediation functions...
>>>>>>
>>>>>> I'm inclined to add orBasicList, orSubTemplateList, orSubTempalteMultiList to the draft (This is a small addition after all) and to express that, by default, a logical AND is assumed... specifically if the structured data is used in the IPFIX Mediation Protocol.
>>>>>>
>>>>>> I'm not convinced by the NOT in the context of structured data, as we don't even have a concept of NOT for a single information element!
>>>>>>
>>>>>> I would like to get some more feedback from others.
>>>>>>
>>>>>> Regards, Benoit.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Regarding draft-ietf-ipfix-structured-data, I see the risk that the
>>>>>>> semantic of the exported structured data is not clear.
>>>>>>>
>>>>>>> How do you interpret the manifold occurrence of the same Information
>>>>>>> Element (basicList) or the same group of Information Elements
>>>>>>> (subTemplateList) in one record?
>>>>>>>
>>>>>>> What does it mean if basicList, subTemplateList, or subTemplateMultiList
>>>>>>> is used for a Flow Key field? Or non-key field?
>>>>>>>
>>>>>>> Some Examples:
>>>>>>>
>>>>>>> - BasicList of egress interfaces in a Flow Record
>>>>>>>   How should a Flow Record be interpreted which contains a list of
>>>>>>>   egress interfaces and a packet counter?
>>>>>>>   Has every counted packet been sent on every egress interface?
>>>>>>>     =>  multicast case, AND semantic (see example in section 8.1)
>>>>>>>   Has every counted packet been sent on any one of the egress
>>>>>>>   interfaces?
>>>>>>>     =>  load balancing case, OR semantic
>>>>>>>   Can it be used as a Flow Key or not?
>>>>>>>
>>>>>>> - BasicList of destination ports in a Flow Record
>>>>>>>   As every packet has only one destination port, the only reasonable
>>>>>>>   interpretation is that the Flow contains packets having one of
>>>>>>>   the reported port numbers.
>>>>>>>     =>  OR semantic
>>>>>>>   This would be a non-key field.
>>>>>>>
>>>>>>>
>>>>>>> I think there are two solutions:
>>>>>>>
>>>>>>> 1. We decide that the semantic of list content is out of scope of
>>>>>>>    draft-ietf-ipfix-structured-data. We add a note to the draft that
>>>>>>>    the semantic must be clear from the context or the definition of the
>>>>>>>    Information Elements used within the lists.
>>>>>>>
>>>>>>> 2. We define semantic lists, such as
>>>>>>>    - andBasicList, andSubTemplateList, andSubTemplateMultiList
>>>>>>>    - orBasicList, orSubTemplateList, orSubTempalteMultiList
>>>>>>>    describing AND and OR semantic of the contained IEs/Templates,
>>>>>>>    respectively.
>>>>>>>
>>>>>>>
>>>>>>> As I wrote in an earlier mail, I see a good use case for orBasicList. It
>>>>>>> could be used in the Selector Report Interpretation of Property Match
>>>>>>> Filtering to report a filter like "port 80 or port 443".
>>>>>>>
>>>>>>> http://www.ietf.org/mail-archive/web/ipfix/current/msg04856.html
>>>>>>>
>>>>>>> At the moment, the Selector Report Interpretation is limited to AND.
>>>>>>> However, if we also want to express a NOT, we still need another solution...
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gerhard
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> IPFIX mailing list
>>>>>>>
>>>>>>> IPFIX@ietf.org
>>>>>>>
>>>>>>>
>>>>>>> https://www.ietf.org/mailman/listinfo/ipfix
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> _______________________________________________
>>>>>> IPFIX mailing list
>>>>>>
>>>>>> IPFIX@ietf.org
>>>>>>
>>>>>>
>>>>>> https://www.ietf.org/mailman/listinfo/ipfix
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>
>>>>          
>>>
>>>        
>>      
>