[Gen-art] Gen-art telechat review of draft-ietf-i2rs-traceability-09

Elwyn Davies <elwynd@dial.pipex.com> Thu, 05 May 2016 13:41 UTC

To: General area reviewing team <gen-art@ietf.org>
From: Elwyn Davies <elwynd@dial.pipex.com>
Message-ID: <572B4D51.10003@dial.pipex.com>
Date: Thu, 05 May 2016 14:40:33 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/YfwTUiDmV4MXQRQmTrD2_DgLzwI>
Cc: draft-ietf-i2rs-traceability.all@ietf.org
Subject: [Gen-art] Gen-art telechat review of draft-ietf-i2rs-traceability-09
Precedence: list

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please wait for direction from your
document shepherd or AD before posting a new version of the draft.

For more information, please see the FAQ at

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document: draft-ietf-i2rs-traceability-09.txt
Reviewer: Elwyn Davies
Review Date: 2016/05/05
IETF LC End Date: 2016/04/29
IESG Telechat date: 2016/05/05

Summary:
I have concerns about the trace model used as explained below.  It may 
be that there is good reason and WG consensus for the model adopted, but 
it would be good to see some explanation of the rather curious hybrid 
model used.  There are also significant issues with the description of 
timestamps and a number of other nits/editorial matters to address.

Apologies for the last minute delivery.

Possibly Major issues:
Trace model:  The tracing model seems to be a curious hybrid of state 
recording and event logging.  The introduction seems to imply that the 
tracing model records events.  Indeed it does but state entry events do 
not appear to get recorded until the sequence transitions out of the 
state.  I can see that the COMPLETED entries record the total processing 
period, but this loses the detail of when actual processing of the event 
starts (as opposed to becoming PENDING).  I was somewhat surprised that 
a simple chained transition event model was not used (especially since 
the tracing entries are actually chained together already).

In particular if some sort of disaster occurs, it seems possible in this 
model that events in the PENDING queue might never appear in the trace 
log at all if the request hasn't started being processed. It also 
doesn't record any preprocessing time before the request becomes 
PENDING.  If there is a processing bottleneck this could be significant 
information.

I was also wondering whether this model traces the arrival and departure 
of clients (and whether authoentication/authorisation worked or not).   
This may be covered by operation types in the architecture which I 
haven't had time to read in detail.

Minor issues:

Nits/editorial comments:
s1:  The Intro should also contain a description of the intention of the 
document - basically a slight reworking of the abstract.  It should also 
outline the association of the framework with the interface (i2rs 
client<->agent) to which the traceability applies.

s3:
> The
>     ability to automate and abstract even complex policy-based controls
>     highlights the need for an equally scalable traceability function to
>     provide event-level granularity of the routing system compliant with
>     the requirements of I2RS (Section 5 of
>     [I-D.ietf-i2rs-problem-statement]).
The 'routing system' doesn't have an event-level granularity.  Maybe
OLD:
provide event-level granularity of the routing system
NEW:
provide recording at event-level granularity of the evolution of the 
routing system
END

s4:  The section ends with this list of 'use cases':
>     As I2RS becomes increasingly pervasive in routing environments, a
>     traceability model offers significant advantages and facilitates the
>     following use cases:
>
>     1  Automated event correlation, trend analysis, and anomaly
>        detection;
>
>     2  Trace log storage for offline (manual or tools) analysis;
>
>     3  Improved accounting of routing system operations;
>
>     4  Standardized structured data format for writing common tools;
>
>     5  Common reference for automated testing and incident reporting;
>
>     6  Real-time monitoring and troubleshooting;
>
>     7  Enhanced network audit, management and forensic analysis
>        capabilities.
I have added numbers to facilitate these comments:
IMO #2 and #4 are either not use cases or a not phrased as use cases.  
The automated testing is not really a use case as such. Having these 
characteristics supports the implementation of the actual use cases.  
Related to the data retention comment above, storing some or all of the 
trace log - and knowing which bits might be critical to control data 
retention - is a use case but the basic storage is just a necessary 
prerequisite of doing other things.  I also might suggest a reordering 
indicating importance perhaps.

Thus I would suggest replacing this with something like:

    As I2RS becomes increasingly pervasive in routing environments, a
    traceability model that supports controllable trace log retention
    using a standardized structured data format offers significant 
advantages,
    such as the ability to create common tools and support automated 
testing,
    and facilitates the following use cases:

    o  Real-time monitoring and troubleshooting of router events;

    o  Automated event correlation, trend analysis, and anomaly
       detection;

    o  Offline (manual or tools-based) analysis of router state evolution
        from the retained trace logs;

    o  Enhanced network audit, management and forensic analysis
        capabilities;

    o  Improved accounting of routing system operations; and

    o Providing a standardized format for incident reporting and test 
logging.

s5: .. is empty: Empty sections are not desirable.  A brief overview of 
the following sub-sections should be added (or alternatively promote 
s5.1 which actually describes the framework).

s5.1, para 1:
> Some notable elements of the architecture are in
>     this section.
I don't understand this sentence.  If it implies that elements of the 
architecture are defined in this section then one has to ask 'Why aren't 
they defined in the architecture document?'  Since s5.1 contains the 
whole framework, what other elements than the 'some notable' ones are there?

s5.1, para 2: The term 'northbpund' is not defined (and isn't used in 
the architecture').

s5.2: The title is ' I2RS Trace Log Mandatory Fields'  - nothing that 
isn't mandatory is discussed.  Should there be some words about optional 
extra fields?

s5.2, timestamps:  The RFC3339 format doesn't tie up with 32 bit 
resolution - there are hours and minutes etc and decimal representation 
is used.  Things like origin for timestamps needs to be defined if they 
are to be truly useful for comparison outside an individual enterprise 
(as might be implied by the incident reporting use case).  If RFC 3339 
format is really used, then the timestamps need to include the date as 
well since logs will certainly run over more than one day.  I note that 
the example in s6 shows full RFC 3339 date/time format examples.

s5.2, Applied Operation Data:  Does the Operation Data Present flag 
apply to this field?  Can this be present even if there is no Requested 
Operation Data?

  s5.2, Result Code: Need to expand acronym RIB.

s7.2:  One key point about timestamping (motivated by bitter experience) 
is that timestamps need to be recorded at the point when the event 
actually happens and not when the event is (potentially significantly 
later) entered into the log.  Logging is (as indicated) often allocated 
a low priority and event log writing may end up being postponed for a 
considerable time.

s11: I would consider I-D.ietf-i2rs-problem-statement and 
I-D.ietf-i2rs-pub-sub-requirements to be Informative; and
I-D.ietf-i2rs-rib-info-model, RFC 3339 and possibly RFC 5424 to be 
normative.

[Gen-art] Gen-art telechat review of draft-ietf-i… Elwyn Davies
Re: [Gen-art] Gen-art telechat review of draft-ie… Jari Arkko
Re: [Gen-art] Gen-art telechat review of draft-ie… Joe Clarke
Re: [Gen-art] Gen-art telechat review of draft-ie… Elwyn Davies
Re: [Gen-art] Gen-art telechat review of draft-ie… Joe Clarke
Re: [Gen-art] Gen-art telechat review of draft-ie… Elwyn Davies
Re: [Gen-art] Gen-art telechat review of draft-ie… Joe Clarke
Re: [Gen-art] Gen-art telechat review of draft-ie… Elwyn Davies
Re: [Gen-art] Gen-art telechat review of draft-ie… Joe Clarke
Re: [Gen-art] Gen-art telechat review of draft-ie… Elwyn Davies