Re: [IPFIX] review of draft-ietf-ipfix-a9n-04

Brian Trammell <trammell@tik.ee.ethz.ch> Wed, 22 August 2012 14:45 UTC

Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset="iso-8859-1"
From: Brian Trammell <trammell@tik.ee.ethz.ch>
In-Reply-To: <500039DE.5050706@cisco.com>
Date: Wed, 22 Aug 2012 16:45:12 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <071B9C0F-B58F-489A-99E1-A3A4A4D25382@tik.ee.ethz.ch>
References: <500039DE.5050706@cisco.com>
To: Paul Aitken <paitken@cisco.com>
Cc: draft-ietf-ipfix-a9n@tools.ietf.org, IETF IPFIX Working Group <ipfix@ietf.org>
Subject: Re: [IPFIX] review of draft-ietf-ipfix-a9n-04
Precedence: list

Hi, Paul,

Thanks for the review! Coming back to this now that IE-DOCTORS is waiting on resolution... Deleted comments have been fixed in the document without discussion; other comments inline.

I've incorporated edits into a -06 revision, to be submitted shortly.

On Jul 13, 2012, at 5:08 PM, Paul Aitken wrote:

> Dear Authors,
> 
> Here with an extensive and detailed review of draft-ietf-ipfix-a9n-04.
> 
> P.
> 

<snip>

>> 
>> 2.  Terminology
>> 
>>    Terms used in this document that are defined in the Terminology
>>    section of the IPFIX Protocol [I-D.ietf-ipfix-protocol-rfc5101bis]
>>    document are to be interpreted as defined there.
>> 
>>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>>    document are to be interpreted as described in [RFC2119].
>> 
>>    In addition, this document defines the following terms
>> 
>>    Aggregated Flow:   A Flow, as defined by
>>       [I-D.ietf-ipfix-protocol-rfc5101bis], derived from a set of zero
>>       or more original Flows within a defined Aggregation Interval.  The
>> 
> 
> Zero things can't be aggregated. You can only say, "none were seen".

This is equivalent to what's written in the terminology, no? Export of an aggregation of an empty set is the same thing as an assertion that nothing was observed corresponding to the given key in the reported interval.

>>       primary difference between a Flow and an Aggregated Flow in the
>>       general case is that the time interval (i.e., the two-tuple of
>>       start and end times) of a Flow is derived from information about
>>       the timing of the packets comprising the Flow, while the time
>>       interval of an Aggregated Flow is often externally imposed.  Note
>>       that an Aggregated Flow is defined in the context of an
>>       Intermediate Aggregation Process only.  Once an Aggregated Flow is
>>       exported, it is essentially a Flow as in
>>       [I-D.ietf-ipfix-protocol-rfc5101bis] and can be treated as such.  

<snip>

>>    Original Exporter:   When the Intermediate Aggregation Process is
>>       hosted in an IPFIX Mediator, the Original Exporter is the Exporter
>>       from which the Original Flows are received.
>> 
> 
> Does the meaning of "Original Exporter" change when the Intermediate Aggregation Process is _not_  hosted in an IPFIX Mediator?

No, it simply has no meaning.

<snip>

>> 3.  Use Cases for IPFIX Aggregation

<snip>

>> 
>>    While much of the discussion in this document, and all of the
>>    examples, apply to the common case that the Original Flows to be
>>    aggregated are all of the same underlying type (i.e., are represented
>>    with identical or compatible Templates), and that each packet
>> 
> 
> What are "compatible" Templates ?

Templates which represent the same essential complex data type (6313 notwithstanding); templates containing substantially the same IEs, with allowances for free convertibility of timestamps and information which is multiply representable (e.g. TOS and DSCP, 5103 biflows and initiator*/responder*, etc.) and the dropping of IEs in which the IAP is not interested or cannot decode.

I'll describe this more succinctly in the document.

> How does this work when there are zero things to be aggregated? ie, are zero things all of the same type?

I'd presume you generally only have zero things in the context of some complex type -- i.e., you know what the template would be of the records you didn't see, because you got it from the OE. The degenerate case of you receive nothing at all, not even a template, would be handled by not exporting anything and maybe reminding whoever's responsible for the whole infrastructure that they should turn stuff on with a helpful note in the log.

Otherwise, I suppose this is an interesting philosophical question ("if a flow doesn't pass through a router in the woods, does anyone hear it?"), but I don't get the point...

<snip>

>> 5.  IP Flow Aggregation Operations
>> 
>>    As stated in Section 2, an Aggregated Flow is simply an IPFIX Flow
>>    generated from Original Flows by an Intermediate Aggregation Process.
>>    Here, we detail the operations by which this is achieved within an
>>    Intermediate Aggregation Process.
>> 
>> 5.1.  Temporal Aggregation through Interval Distribution
>> 
>>    Interval distribution imposes a time interval on the resulting
>>    Aggregated Flows.  The selection of an interval is specific to the
>>    given aggregation application.  Intervals may be derived from the
>>    Original Flows themselves (e.g., an interval may be selected to cover
>>    the entire time containing the set of all Flows sharing a given Key,
>>    as in Time Composition described in Section 5.1.2) or externally
>>    imposed; in the latter case the externally imposed interval may be
>>    regular (e.g., every five minutes) or irregular (e.g., to allow for
>>    different time resolutions at different times of day, under different
>>    network conditions, or indeed for different sets of Original Flows).
>> 
>>    The length of the imposed interval itself has tradeoffs.  Shorter
>>    intervals allow higher-resolution aggregated data and, in streaming
>> 
>> 
>> 
>> Trammell, et al.        Expires December 29, 2012              [Page 15]
>> Internet-Draft              IPFIX Aggregation                  June 2012
>> 
>> 
>>    applications, faster reaction time.  Longer intervals generally lead
>>    to greater data reduction and simplified counter distribution.
>>    Specifically, counter distribution is greatly simplified by the
>>    choice of an interval longer than the duration of longest Original
>>    Flow, itself generally determined by the Original Flow's Metering
>>    Process active timeout; in this case an Original Flow can contribute
>>    to at most two Aggregated Flows, and the more complex value
>>    distribution methods become inapplicable.
>> 
>>    |                |                |                |
>>    | |<--Flow A-->| |                |                |
>>    |        |<--Flow B-->|           |                |
>>    |          |<-------------Flow C-------------->|   |
>>    |                |                |                |
>>    |   interval 0   |   interval 1   |   interval 2   |
>> 
>>               Figure 5: Illustration of interval distribution
>> 
>>    In Figure 5, we illustrate three common possibilities for interval
>>    distribution as applies with regular intervals to a set of three
>>    Original Flows.  For Flow A, the start and end times lie within the
>>    boundaries of a single interval 0; therefore, Flow A contributes to
>>    only one Aggregated Flow.  Flow B, by contrast, has the same duration
>>    but crosses the boundary between intervals 0 and 1; therefore, it
>>    will contribute to two Aggregated Flows, and its counters must be
>>    distributed among these Flows, though in the two-interval case this
>>    can be simplified somewhat simply by picking one of the two
>>    intervals, or proportionally distributing between them.  Only Flows
>>    like Flow A and Flow B will be produced when the interval is chosen
>>    to be longer than the duration of longest Original Flow, as above.
>>    More complicated is the case of Flow C, which contributes to more
>>    than two Aggregated Flows, and must have its counters distributed
>>    according to some policy as in Section 5.1.1.
>> 
> 
> This discussion applies equally to many other fields. eg, Figure 5 could show source ports, with the "intervals" grouping well-known ports and ephemeral ports. Flow A uses only a few source ports whereas Flow C uses many.

Hm, not really; this is that in reverse. It's like saying that original Flow had N source ports, and we're going to split that into N Partially Aggregated Flows, one with each of the original ports. 

<snip>

>> 
>> 5.1.1.  Distributing Values Across Intervals

<snip>

>> 
>>    End Interval:   The counters for an Original Flow are added to the
>>       counters of the appropriate Aggregated Flow containing the end
>>       time of the Original Flow.
>> 
>>    Start Interval:   The counters for an Original Flow are added to the
>>       counters of the appropriate Aggregated Flow containing the start
>>       time of the Original Flow.
>> 
>>    Mid Interval:   The counters for an Original Flow are added to the
>>       counters of a single appropriate Aggregated Flow containing some
>>       timestamp between start and end time of the Original Flow.
>> 
>>    Simple Uniform Distribution:   Each counter for an Original Flow is
>>       divided by the number of time intervals the Original Flow covers
>>       (i.e., of appropriate Aggregated Flows sharing the same Flow
>>       Keys), and this number is added to each corresponding counter in
>>       each Aggregated Flow.
>> 
>>    Proportional Uniform Distribution:   This is like simple uniform
>>       distribution, but accounts for the fractional portions of a time
>>       interval covered by an Original Flow in the first and last time
>>       interval.  Each counter for an Original Flow is divided by the
>>       number of time _units_ the Original Flow covers, to derive a mean
>>       count rate.  This rate is then multiplied by the number of time
>>       units in the intersection of the duration of the Original Flow and
>>       the time interval of each Aggregated Flow.
>> 
>>    Simulated Process:   Each counter of the Original Flow is distributed
>>       among the intervals of the Aggregated Flows according to some
>>       function the Aggregation Process uses based upon properties of
>>       Flows presumed to be like the Original Flow.  For example, Flow
>>       Records representing bulk transfer might follow a more or less
>>       proportional uniform distribution, while interactive processes are
>>       far more bursty.
>> 
> 
> BTW, the case of a collector or mediator aggregating received flows presents another possibility: the flow could be received late (eg, delayed export), so rather than re-opening an old start / mid / end aggregation, the flow is simply included in the "current" aggregation. ie, real time aggregation, regardless of the flow timestamps.

A good point; however, this is already covered in section 6.2: 

   In certain circumstances, additional delay at the original Exporter
   may cause an IAP to close an interval before the last Original
   Flow(s) accountable to the interval arrives; in this case the IAP
   SHOULD drop the late Original Flow(s).  Accounting of flows lost at
   an Intermediate Process due to such issues is covered in
   [I-D.ietf-ipfix-mediation-protocol].

Would you suggest loosening the language to allow an IAP to fake timestamps to avoid drops in delay-intolerant situations?

>> Trammell, et al.        Expires December 29, 2012              [Page 17]
>> Internet-Draft              IPFIX Aggregation                  June 2012
>> 
>> 
>>    Direct:   The Aggregation Process has access to the original packet
>>       timings from the packets making up the Original Flow, and uses
>>       these to distribute or recalculate the counters.
>> 
>>    A method for exporting the distribution of counters across multiple
>>    Aggregated Flows is detailed in Section 7.4.  In any case, counters
>>    MUST be distributed across the multiple Aggregated Flows in such a
>>    way that the total count is preserved, within the limits of accuracy
>>    of the implementation.  This property allows data to be aggregated
>>    and re-aggregated with negligible loss of original count information.
>>    To avoid confusion in interpretation of the aggregated data, all the
>>    counters for a set of given Original Flows SHOULD be distributed via
>>    the same method.
>> 
> 
> Consider changing that to a "MUST".
> 
> As long as the method is consistent, is it necessary to know what method was used?

Depending on what you intend to do with the data, it can be helpful to know if everything's in one bin or if there was some distribution across bins; however, exactly how it was done may not be so important. This list is meant only to be a covering set of possible distribution policies (given the assumption that you're not simply faking timestamps, as above)...

<snip>

>> 
>> 5.4.  Aggregation Combination
>> 
>>    Interval distribution and key aggregation together may generate
>>    multiple Partially Aggregated Flows covering the same time interval
>>    with the same set of Flow Key values.  The process of combining these
>>    Partially Aggregated Flows into a single Aggregated Flow is called
>>    aggregation combination.  In general, non-Key values from multiple
>>    Contributing Flows are combined using the same operation by which
>>    values are combined from packets to form Flows for each Information
>>    Element.  Counters are summed, averages are averaged, flags are
>>    unioned, and so on.
>> 
> 
> Disagree. Delta counters can be summed; total counters need the latest (newest) value.

For aggregating across multiple contributing flows, you take the latest total counter from each, then sum them.

> However, averages cannot be averaged.

Yeah, that's just a silly mistake; removed.

<snip>

>> 7.2.  Flow Count Export
>> 
>>    The following four Information Elements are defined to count Original
>>    Flows as discussed in Section 5.2.1.
>> 
>> 7.2.1.  originalFlowsPresent
>> 
>>    Description:   The non-conservative count of Original Flows
>>       contributing to this Aggregated Flow.  Non-conservative counts
>>       need not sum to the original count on re-aggregation.
>> 
>>    Abstract Data Type:   unsigned64
>> 
> 
> Sematics? Either totalCount or deltaCount. Same for the fields below too.

deltaCount for these, total for the distinct counters thanks for catching this...

>>  
>> 
>> 
>>    ElementId:   TBD1
>> 
>>    Status:   Current
>> 
> 
> I'm sure it's not necessary to say "Status: Current", since it's very unlikely that we'd be introducing new but deprecated fields.
> 
> What does IE police have to say on this matter?

It's not in the IE-DOCTORS template; this is copied from an (older) document without a specified template (I think anon), wherein Status: was a field.

>> 
>> 7.2.2.  originalFlowsInitiated
>> 
>>    Description:   The conservative count of Original Flows whose first
>>       packet is represented within this Aggregated Flow.  Conservative
>>       counts must sum to the original count on re-aggregation.
>> 
>>    Abstract Data Type:   unsigned64
>> 
>>    ElementId:   TBD2
>> 
>>    Status:   Current
>> 
>> 7.2.3.  originalFlowsCompleted
>> 
>>    Description:   The conservative count of Original Flows whose last
>>       packet is represented within this Aggregated Flow.  Conservative
>>       counts must sum to the original count on re-aggregation.
>> 
>>    Abstract Data Type:   unsigned64
>> 
>>    ElementId:   TBD3
>> 
>>    Status:   Current
>> 
>> 7.2.4.  deltaFlowCount
>> 
>>    Description:   The conservative count of Original Flows contributing
>>       to this Aggregated Flow; may be distributed via any of the methods
>>       described in Section 5.1.1.  This Information Element is
>>       compatible with Information Element 3 as used in NetFlow version
>>       9.
>> 
> 
> This is not a good description for inclusion in the IANA registry (which currently doesn't define element 3):
> 
> - "may be distributed via any of the methods described in Section 5.1.1." only makes sense within the context of this document.
> - "This Information Element is compatible with Information Element 3 as used in NetFlow version 9." is suitable as a Note, but not as part of the description.

Good points both. Will fix.

> 
>>  
>> 
>> 
>> 
>> 
>> Trammell, et al.        Expires December 29, 2012              [Page 26]
>> Internet-Draft              IPFIX Aggregation                  June 2012
>> 
>> 
>>    Abstract Data Type:   unsigned64
>> 
>>    ElementId:   3
>> 
>>    Status:   Current
>> 
>> 7.3.  Distinct Host Export
>> 

<snip>

> It's a pity that six new IEs were required here, rather than a single "countOf" field + property. eg, "countOf" + "SourceIPv6Address".

See my messages of 9 July http://www.ietf.org/mail-archive/web/ipfix/current/msg06436.html and 9 April  http://www.ietf.org/mail-archive/web/ipfix/current/msg06347.html which modify mibvar to be the first half of a general solution to this problem. I suspect at this rate you can expect a message from me on 9 October on the subject, as well. :)

<snip>

>> 7.4.2.  valueDistributionMethod Information Element
>> 
>>    Description:   A description of the method used to distribute the
>>       counters from Contributing Flows into the Aggregated Flow records
>>       described by an associated scope, generally a Template.  The
>>       method is deemed to apply to all the non-key Information Elements
>>       in the referenced scope for which value distribution is a valid
>>       operation; if the originalFlowsInitiated and/or
>>       originalFlowsCompleted Information Elements appear in the
>>       Template, they are not subject to this distribution method, as
>>       they each infer their own distribution method.  This is intended
>>       to be a complete set of possible value distribution methods; it is
>> 
> 
> Do you mean to say it's a non-extensible?

It's an explanation of why there's not a definition of an IANA sub-registry here.

<snip>

>> 8.  Examples

(many, many thanks for going over these again; I know how tedious they are...)

>>    In these examples, the same data, described by the same template,
>>    will be aggregated multiple different ways; this illustrates the
>>    various different functions which could be implemented by
>>    Intermediate Aggregation Processes.  Templates are shown in IESpec
>>    format as introduced in [I-D.ietf-ipfix-ie-doctors].  The source data
>>    format is a simplified flow: timestamps, traditional 5-tuple, and
>>    octet count.  The template is shown in Figure 8.
>> 
>>    flowStartMilliseconds(152)[8]
>>    flowEndMilliseconds(153)[8]
>>    sourceIPv4Address(8)[4]
>>    destinationIPv4Address(12)[4]
>>    sourceTransportPort(7)[2]
>>    destinationTransportPort(11)[2]
>>    protocolIdentifier(4)[1]
>>    octetDeltaCount(1)[8]
>> 
>>                    Figure 8: Input template for examples
>> 
> 
> This isn't really a figure.

It's not really a table either. Additionally, XML2RFC is not apparently capable of labelling things tables unless they appear in the texttable, and texttables are too limited to handle all the various things I need to do in the examples area.

I'm inclined to leave this as it is (umpteen repetitions below omitted) as I don't see a good solution that doesn't involve custom tooling.

<snip, now in section 8.3>

>>    dest ip4       port  dist src
>>    192.0.2.131    53           3
>>    198.51.100.2   80           1
>>    198.51.100.2   443          3
>>    198.51.100.67  80           2
>>    198.51.100.68  80           2
>>    198.51.100.133 80           2
>>    198.51.100.3   80           3
>>    198.51.100.4   80           2
>>    198.51.100.17  80           1
>>    198.51.100.69  443          1
>> 
> 
> This table seems to be completely wrong. Even the total counts is wrong - ie, add up the "dist src" = 20. Yet Figure 9 has 24 entries.

These are *distinct* sources per destination endpoint.

For 192.0.2.131:53, there are 5 flows, but the sources are 192.0.2.2, 192.0.2.3, 192.0.2.131.

> I think it should be:
> 
> 198.51.100.17    80    4
> 198.51.100.2     80    1
> 198.51.100.2    443    4
> 198.51.100.3     80    3
> 198.51.100.4     80    2
> 198.51.100.68    80    4
> 198.51.100.69   443    1

I don't get where this comes from at all. 

<snip>

>> 
>>    Following metadata export, the aggregation steps follow as before.
>>    However, two long flows are distributed across multiple intervals in
>>    the interval imposition step, as indicated with "*" in Figure 27.
>> 
> 
> It might help to use *1 and *2 to distinguish the two flows.

Would do so but I'm at the nits width limit with this figure/table.

<snip>

>>    Benoit Claise
>>    Cisco Systems, Inc.
>>    De Kleetlaan 6a b1
>>    1831 Diagem
>> 
> 
> Once again, in existing RFCs - and in Cisco's internal directory - this is "Diegem 1831".

Per the Universal Postal Union, the most authoritative reference source I could find in English with a couple of minutes of googling, the proper format for addressing in Belgium places the postcode before the municipality name.

See: http://www.upu.int/fileadmin/documentsFiles/activities/addressingUnit/belEn.pdf

>>    Belgium

Cheers,

Brian

[IPFIX] review of draft-ietf-ipfix-a9n-04 Paul Aitken
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Brian Trammell
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Paul Aitken
[IPFIX] review of draft-ietf-ipfix-a9n-07 Paul Aitken
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Brian Trammell
Re: [IPFIX] review of draft-ietf-ipfix-a9n-07 Brian Trammell
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Paul Aitken
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Brian Trammell
Re: [IPFIX] review of draft-ietf-ipfix-a9n-04 Paul Aitken