Re: [Pearg] Comments on draft-rao-pitfol-01

Hi Joe,

Apologies for the delayed response.

Thanks for your review and comments. Appreciate your time on this.  Please
find replies inline ...

I had a chance to review draft-rao-pitfol-01.  Sensitive information in
> logs is a constant problem for many security and privacy organizations so I
> am very happy to see this draft.  Below are some thoughts on the draft.
>
> 1.  While PII information is super important, the problem extends to all
> types of sensitive information in logs.  One common example is service
> credentials written to logs or data that is confidential to an organization
> and not an individual.  I don't see anything that would limit this approach
> to just PII, but it may extend the use cases a bit.
>

[sandeep]:  Agree that the mechanism can be generalized to annotate
anything that is deemed sensitive in logs. While the draft specifically
discusses and illustrates tagging of PII data, we will
think about extending this to mark other sensitive data.

> 2.  Log data is incredibly useful for troubleshooting,  analytics and
> other purposes.   To meet these use cases the log data may be propagated,
> extracted, transformed and stored in multiple places.  This creates some
> challenges:
>
> A. The data may be transformed into other data models and formats may not
> preserve any privacy marking
> B.  If the privacy marking changes on a field it's unclear what should be
> done to historical data.
>
>
[sandeep]:

Great points!

(A) Privacy marking preservation across log/data transformations is
critical as it flows through the log pipeline into various systems.  While
privacy marking and privacy preservation are discrete, we can
emphasize that it is necessary to carry forward any privacy markings across
format transformations without being prescriptive about it.

(B) Agreed, I think retrospective change can be challenging.  We will think
through this and provide some guidance on this in the draft.

> 3.  Often when a sensitive value is written to a log it has gone
> undetected for a period of time.  The values have now propagated to
> multiple systems and the task of the team responding to the incident is to:
>
>    - stop the logging or mis-identification of the data
>    - determine where the sensitive data has propagated to
>    - purge (or other action) the sensitive date from where it now lives
>    - perform remedial actions and notifications (for example to rotate
>    credentials)
>
> I would like a system that helps to automate this.  I think the draft
> provides part of the solution, but it seems there should be more to the
> solution:
>
>    1. Information and data models for sensitivity/privacy tagging of data
>    2. Ways of attaching that sensitivity/privacy tagging to the data
>    itself (similar to current draft)
>    3. Interface to systems to control
>       - actions based on tagging
>       - changes to tagging/classification
>       - communication of tagging/classification "out-of-band"
>
> I'm not really up on the state of the art here.  I know there exists
> proprietary protocols that do similar, but I'm not aware of any standards
> work in this area.
>

[sandeep]: Carrying privacy annotations as you illustrated below can
potentially help in automation of certain incident remediation actions.
This can be a good requirement / use case to think through on the lines of
'Course of Action" for Sensitive data detection incident response, will
think of addressing this as an addendum or in another write up. we can
discuss further on this.

> As an illustration a JSON message could contain a field descriptor to
> match on a field and an action to take when you match on the field.
>
> {
>    "dataFieldDescriptor": {
>          "LogStream": "AuthenticationService",
>          "eventType": "Login",
>          "jsonField": "sessionID",
>         "startDate": "None",
>         "endDate": "Now"
>     },
>     "action": {
>              "SensitivityLevelTag": "4",
>              "NotificationType": "SummaryReport",
>              "Action": "FullAnonymization",
>     }
> }
>
> This message would be sent to one or more systems.  It may result in the
> inline tagging described in the draft, but the message could be propagated
> in other ways.   ETL systems that transform the data could transform the
> request for the systems that consume their data.  The actions can be
> optional and derived from the default for the sensitivity level.
>  Notification and reporting is an essential piece of the puzzle.  The
> interface could be provided by log aggregators, endpoints, and ETL
> systems.
>
> This provides a standard interface to different places that store data
> derived from logs to remediate problems in their existing data.  It could
> potentially be used to service deletion requests and maybe information
> requests.  This obviously needs refinement.  A higher level abstraction
> could decouple the request from the logging format and allow the request go
> through stages.  For example the request says "delete user x@y.com" and
> then intermediate systems fill out how to fulfil the request.
>

[sandeep]:  We are experimenting various annotation formats, the above is a
good illustration, we will more update on this soon.

Thanks so much, great inputs, will follow up on this soon.

Regards,
-Sandeep

> Cheers,
>
> Joe
>
>
>
> --
> Pearg mailing list
> Pearg@irtf.org
> https://www.irtf.org/mailman/listinfo/pearg
>