[Pearg] Comments on draft-rao-pitfol-01

Joseph Salowey <joe@salowey.net> Tue, 26 May 2020 05:04 UTC

Return-Path: <joe@salowey.net>
X-Original-To: pearg@ietfa.amsl.com
Delivered-To: pearg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30C663A09E1 for <pearg@ietfa.amsl.com>; Mon, 25 May 2020 22:04:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=salowey-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j6ZSTWebamqq for <pearg@ietfa.amsl.com>; Mon, 25 May 2020 22:04:19 -0700 (PDT)
Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C56453A0A01 for <pearg@irtf.org>; Mon, 25 May 2020 22:04:19 -0700 (PDT)
Received: by mail-qv1-xf34.google.com with SMTP id er16so8945042qvb.0 for <pearg@irtf.org>; Mon, 25 May 2020 22:04:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salowey-net.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=mMT95O1QO/dvac2WuJbg/DYaCPKybKpz3wVv8TKWT5U=; b=QzpxK5Eui+BV4sKQYEn4CSK0+TDCzZ2vtYsadRLv1yzFqDDezXc0+tdGNEKyr51+HK MXtabu997D0SoAfzoqDkpREPTkx6yyo5L6wTxx4unXW+zyq6D5mOkz7vpOwSb9VwH9V3 O8O7ze2Daz4V+iN/RO3X8xJPQMWPyJs473RJsOl/1EBmaRR6n3eXUm1u69IEQt1lvxe6 KIOVHacl+MdKr00OccGV1ZyW9Pp2dlTZm+XHvXi75kW/H9jeYzhHF74zHXDeTevveW0b N/7r4ArSPIETeE1YJgZKJgww3CEsJYE5U9haSt218kvKcnsn6VWa7uBl6OqZorQ5hj9d CQcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=mMT95O1QO/dvac2WuJbg/DYaCPKybKpz3wVv8TKWT5U=; b=YUZHgOhwsP1W5Rz015dGUbLpXBr2yJYXRb9iwEuwgg01Xwm1Fg/6eNo9SGqlVODESF SX9/CQDOg5a6CPt3Ji9/minKtuVJE6Vpmv9Gf8v8SoZ6szSkyknWdoSxgPu7EPlHeQbj 5Ir+PsLn4xpuFVzyhCtbYhLBoTV6MrScxPlIJmgfsLIEPLxHXoHFePQyavX/2EqwUyxc Z4UJgYV0FVz7uQ8ewMUx8LXqadoSQReLqGZAzxu0eipqBKf6UwuS0Y3ZK3eXNYwKknAY xvjMM9Sl+IT570yR7QPeg+VVwakFNEkTmhxFUaIJH6R3no3BL2xuLyJEYacPJ+7ySUkt tzqg==
X-Gm-Message-State: AOAM533aw1Z6QbCOTzUXqxvTrrU4KQmfsDE+6sN4sy0/8zAevrgMaIeU SQUH9WqPxKwGh/dbULjmOiiJFQSQIzgjAKi59+YnJACj0Gw=
X-Google-Smtp-Source: ABdhPJy3u4HlrvOVQpbYqatGK1OIePPxpsb6LMwiZFVrPII5C539+cWpMWM680iKeMZkyEDpmWPfmwj1e6VUE5bSd30=
X-Received: by 2002:ad4:43cc:: with SMTP id o12mr2765512qvs.62.1590469458004; Mon, 25 May 2020 22:04:18 -0700 (PDT)
MIME-Version: 1.0
From: Joseph Salowey <joe@salowey.net>
Date: Mon, 25 May 2020 22:04:07 -0700
Message-ID: <CAOgPGoB8KzQFs3x8wnaE4_Qa_wvmsCJz7=2j5vfstYDBw4vXbA@mail.gmail.com>
To: pearg@irtf.org
Content-Type: multipart/alternative; boundary="00000000000012253405a6860737"
Archived-At: <https://mailarchive.ietf.org/arch/msg/pearg/Mh0J62uK41-rVjSjeKMWSFRCvP0>
Subject: [Pearg] Comments on draft-rao-pitfol-01
X-BeenThere: pearg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Privacy Enhancements and Assessment Proposed RG <pearg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/pearg>, <mailto:pearg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pearg/>
List-Post: <mailto:pearg@irtf.org>
List-Help: <mailto:pearg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/pearg>, <mailto:pearg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 May 2020 05:04:23 -0000

I had a chance to review draft-rao-pitfol-01.  Sensitive information in
logs is a constant problem for many security and privacy organizations so I
am very happy to see this draft.  Below are some thoughts on the draft.

1.  While PII information is super important, the problem extends to all
types of sensitive information in logs.  One common example is service
credentials written to logs or data that is confidential to an organization
and not an individual.  I don't see anything that would limit this approach
to just PII, but it may extend the use cases a bit.

2.  Log data is incredibly useful for troubleshooting,  analytics and other
purposes.   To meet these use cases the log data may be propagated,
extracted, transformed and stored in multiple places.  This creates some
challenges:

A. The data may be transformed into other data models and formats may not
preserve any privacy marking
B.  If the privacy marking changes on a field it's unclear what should be
done to historical data.

3.  Often when a sensitive value is written to a log it has gone undetected
for a period of time.  The values have now propagated to multiple systems
and the task of the team responding to the incident is to:

   - stop the logging or mis-identification of the data
   - determine where the sensitive data has propagated to
   - purge (or other action) the sensitive date from where it now lives
   - perform remedial actions and notifications (for example to rotate
   credentials)

I would like a system that helps to automate this.  I think the draft
provides part of the solution, but it seems there should be more to the
solution:

   1. Information and data models for sensitivity/privacy tagging of data
   2. Ways of attaching that sensitivity/privacy tagging to the data itself
   (similar to current draft)
   3. Interface to systems to control
      - actions based on tagging
      - changes to tagging/classification
      - communication of tagging/classification "out-of-band"

I'm not really up on the state of the art here.  I know there exists
proprietary protocols that do similar, but I'm not aware of any standards
work in this area.

As an illustration a JSON message could contain a field descriptor to match
on a field and an action to take when you match on the field.

{
   "dataFieldDescriptor": {
         "LogStream": "AuthenticationService",
         "eventType": "Login",
         "jsonField": "sessionID",
        "startDate": "None",
        "endDate": "Now"
    },
    "action": {
             "SensitivityLevelTag": "4",
             "NotificationType": "SummaryReport",
             "Action": "FullAnonymization",
    }
}

This message would be sent to one or more systems.  It may result in the
inline tagging described in the draft, but the message could be propagated
in other ways.   ETL systems that transform the data could transform the
request for the systems that consume their data.  The actions can be
optional and derived from the default for the sensitivity level.
 Notification and reporting is an essential piece of the puzzle.  The
interface could be provided by log aggregators, endpoints, and ETL
systems.

This provides a standard interface to different places that store data
derived from logs to remediate problems in their existing data.  It could
potentially be used to service deletion requests and maybe information
requests.  This obviously needs refinement.  A higher level abstraction
could decouple the request from the logging format and allow the request go
through stages.  For example the request says "delete user x@y.com" and
then intermediate systems fill out how to fulfil the request.

Cheers,

Joe