Re: [Gen-art] Gen-art telechat review of draft-ietf-i2rs-traceability-09

Joe Clarke <jclarke@cisco.com> Fri, 06 May 2016 20:27 UTC

Return-Path: <jclarke@cisco.com>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66B7612D1D6; Fri, 6 May 2016 13:27:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.517
X-Spam-Level:
X-Spam-Status: No, score=-15.517 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.996, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TDtkY7j93c2N; Fri, 6 May 2016 13:27:29 -0700 (PDT)
Received: from rcdn-iport-9.cisco.com (rcdn-iport-9.cisco.com [173.37.86.80]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 29D3712D10D; Fri, 6 May 2016 13:27:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=7285; q=dns/txt; s=iport; t=1462566449; x=1463776049; h=subject:to:references:cc:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=/EolUT8f+uIRS4zjlL1ZfYnO5Ie4x93OnL07NR3HhYg=; b=TcIDkpSrTzYnAeQeMRIUdi41VkHU9h5M91NQFoCs63WwO0jn+MRSnv27 7125bq6xTmfLY4/yKxan4ohUhHTMrAm852l+gbG2fbEfNtVwit6mfVJLe PseDtxKOsAnTVss9kVTQ6MAQed9nTWWovWd+oHY/TrWDMcw6awgGjBLRv s=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BoCAD6/CxX/5NdJa1egzhVKwNPuw8khWwCgTg7EQEBAQEBAQFlJ4RCAQEEIwQRQRALEgYCAiYCAkkOBgEMCAEBiCcOrUGRDQEBAQEBAQEBAQEBAQEBAQEBARh8hSSBdgiCToQrgxSCWQEEmB+OG4Fqh1QjhTWPNjYshAcgMgEBAYZggT0BAQE
X-IronPort-AV: E=Sophos;i="5.24,587,1454976000"; d="scan'208";a="99561168"
Received: from rcdn-core-11.cisco.com ([173.37.93.147]) by rcdn-iport-9.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2016 20:27:28 +0000
Received: from [10.24.183.36] ([10.24.183.36]) by rcdn-core-11.cisco.com (8.14.5/8.14.5) with ESMTP id u46KRRBt023990; Fri, 6 May 2016 20:27:28 GMT
To: Elwyn Davies <elwynd@dial.pipex.com>, General area reviewing team <gen-art@ietf.org>
References: <572B4D51.10003@dial.pipex.com>
From: Joe Clarke <jclarke@cisco.com>
Organization: Cisco Systems, Inc.
Message-ID: <e8a0bf59-f1ea-3710-9b3b-2820ea1ef64b@cisco.com>
Date: Fri, 06 May 2016 16:27:27 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <572B4D51.10003@dial.pipex.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/Ag8H04GN1_EMiNIEVwAfsjEEY5U>
Cc: draft-ietf-i2rs-traceability.all@ietf.org
Subject: Re: [Gen-art] Gen-art telechat review of draft-ietf-i2rs-traceability-09
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2016 20:27:31 -0000

Thank you for your review, Elwyn.  You raise some excellent points.

I'm top-posting here as I think we've addressed all of your concerns. 
We're still reviewing the new text among ourselves, but I wanted to show 
you the SxS diffs to get your take.  Some of the changes incorporate 
other AD/IESG comments as well.

http://www.marcuscom.com/draft-ietf-i2rs-traceability.txt-from-09-10.diff.html

Joe

On 5/5/16 09:40, Elwyn Davies wrote:
> Possibly Major issues:
> Trace model:  The tracing model seems to be a curious hybrid of state
> recording and event logging.  The introduction seems to imply that the
> tracing model records events.  Indeed it does but state entry events do
> not appear to get recorded until the sequence transitions out of the
> state.  I can see that the COMPLETED entries record the total processing
> period, but this loses the detail of when actual processing of the event
> starts (as opposed to becoming PENDING).  I was somewhat surprised that
> a simple chained transition event model was not used (especially since
> the tracing entries are actually chained together already).
>
> In particular if some sort of disaster occurs, it seems possible in this
> model that events in the PENDING queue might never appear in the trace
> log at all if the request hasn't started being processed. It also
> doesn't record any preprocessing time before the request becomes
> PENDING.  If there is a processing bottleneck this could be significant
> information.
>
> I was also wondering whether this model traces the arrival and departure
> of clients (and whether authoentication/authorisation worked or not).
> This may be covered by operation types in the architecture which I
> haven't had time to read in detail.
>
> Minor issues:
>
> Nits/editorial comments:
> s1:  The Intro should also contain a description of the intention of the
> document - basically a slight reworking of the abstract.  It should also
> outline the association of the framework with the interface (i2rs
> client<->agent) to which the traceability applies.
>
> s3:
>> The
>>     ability to automate and abstract even complex policy-based controls
>>     highlights the need for an equally scalable traceability function to
>>     provide event-level granularity of the routing system compliant with
>>     the requirements of I2RS (Section 5 of
>>     [I-D.ietf-i2rs-problem-statement]).
> The 'routing system' doesn't have an event-level granularity.  Maybe
> OLD:
> provide event-level granularity of the routing system
> NEW:
> provide recording at event-level granularity of the evolution of the
> routing system
> END
>
> s4:  The section ends with this list of 'use cases':
>>     As I2RS becomes increasingly pervasive in routing environments, a
>>     traceability model offers significant advantages and facilitates the
>>     following use cases:
>>
>>     1  Automated event correlation, trend analysis, and anomaly
>>        detection;
>>
>>     2  Trace log storage for offline (manual or tools) analysis;
>>
>>     3  Improved accounting of routing system operations;
>>
>>     4  Standardized structured data format for writing common tools;
>>
>>     5  Common reference for automated testing and incident reporting;
>>
>>     6  Real-time monitoring and troubleshooting;
>>
>>     7  Enhanced network audit, management and forensic analysis
>>        capabilities.
> I have added numbers to facilitate these comments:
> IMO #2 and #4 are either not use cases or a not phrased as use cases.
> The automated testing is not really a use case as such. Having these
> characteristics supports the implementation of the actual use cases.
> Related to the data retention comment above, storing some or all of the
> trace log - and knowing which bits might be critical to control data
> retention - is a use case but the basic storage is just a necessary
> prerequisite of doing other things.  I also might suggest a reordering
> indicating importance perhaps.
>
> Thus I would suggest replacing this with something like:
>
>    As I2RS becomes increasingly pervasive in routing environments, a
>    traceability model that supports controllable trace log retention
>    using a standardized structured data format offers significant
> advantages,
>    such as the ability to create common tools and support automated
> testing,
>    and facilitates the following use cases:
>
>    o  Real-time monitoring and troubleshooting of router events;
>
>    o  Automated event correlation, trend analysis, and anomaly
>       detection;
>
>    o  Offline (manual or tools-based) analysis of router state evolution
>        from the retained trace logs;
>
>    o  Enhanced network audit, management and forensic analysis
>        capabilities;
>
>    o  Improved accounting of routing system operations; and
>
>    o Providing a standardized format for incident reporting and test
> logging.
>
> s5: .. is empty: Empty sections are not desirable.  A brief overview of
> the following sub-sections should be added (or alternatively promote
> s5.1 which actually describes the framework).
>
> s5.1, para 1:
>> Some notable elements of the architecture are in
>>     this section.
> I don't understand this sentence.  If it implies that elements of the
> architecture are defined in this section then one has to ask 'Why aren't
> they defined in the architecture document?'  Since s5.1 contains the
> whole framework, what other elements than the 'some notable' ones are
> there?
>
> s5.1, para 2: The term 'northbpund' is not defined (and isn't used in
> the architecture').
>
> s5.2: The title is ' I2RS Trace Log Mandatory Fields'  - nothing that
> isn't mandatory is discussed.  Should there be some words about optional
> extra fields?
>
> s5.2, timestamps:  The RFC3339 format doesn't tie up with 32 bit
> resolution - there are hours and minutes etc and decimal representation
> is used.  Things like origin for timestamps needs to be defined if they
> are to be truly useful for comparison outside an individual enterprise
> (as might be implied by the incident reporting use case).  If RFC 3339
> format is really used, then the timestamps need to include the date as
> well since logs will certainly run over more than one day.  I note that
> the example in s6 shows full RFC 3339 date/time format examples.
>
> s5.2, Applied Operation Data:  Does the Operation Data Present flag
> apply to this field?  Can this be present even if there is no Requested
> Operation Data?
>
>  s5.2, Result Code: Need to expand acronym RIB.
>
> s7.2:  One key point about timestamping (motivated by bitter experience)
> is that timestamps need to be recorded at the point when the event
> actually happens and not when the event is (potentially significantly
> later) entered into the log.  Logging is (as indicated) often allocated
> a low priority and event log writing may end up being postponed for a
> considerable time.
>
> s11: I would consider I-D.ietf-i2rs-problem-statement and
> I-D.ietf-i2rs-pub-sub-requirements to be Informative; and
> I-D.ietf-i2rs-rib-info-model, RFC 3339 and possibly RFC 5424 to be
> normative.