Re: [domainrep] Reputation algorithms

"Murray S. Kucherawy" <msk@cloudmark.com> Fri, 25 May 2012 19:31 UTC

From: "Murray S. Kucherawy" <msk@cloudmark.com>
To: "domainrep@ietf.org" <domainrep@ietf.org>
Thread-Topic: Reputation algorithms
Thread-Index: AQHNOA2SXjK/yrPS3Uyv+UL89j0UGJba5X1A
Date: Fri, 25 May 2012 19:31:24 +0000
Message-ID: <9452079D1A51524AA5749AD23E003928130126@exch-mbx901.corp.cloudmark.com>
References: <mailman.26.1334689205.19668.domainrep@ietf.org> <CBE0EFC4.8C41%tmacaulay@2keys.ca>
In-Reply-To: <CBE0EFC4.8C41%tmacaulay@2keys.ca>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [domainrep] Reputation algorithms
Precedence: list

> -----Original Message-----
> From: domainrep-bounces@ietf.org [mailto:domainrep-bounces@ietf.org] On Behalf Of Tyson Macaulay
> Sent: Tuesday, May 22, 2012 4:25 AM
> To: domainrep@ietf.org
> Subject: [domainrep] Reputation algorithms
> 
> HI again,
> 
> In a soon to be released INFORMATIONAL RFC -
> draft-macaulay-6man-reputation-intelligence-00 we propose criteria for
> reputation intelligence algorithms.  These appear to partially overlap
> with your complimentary conclusions in draft-ietf-repute-media-type-02
> - Section 3.1, but we have some supplementary criteria.
>
> [...]
> 
> Below I have tried to map your reputation criterium against ours - or
> indicate where there is no mapping:
> 
> REPUTON "Rater" =   Implicit in selection of query destination in our model.

It's re-stated in the reply in case the service (host) you used in the query URI is an alias for something else, and you want to know who ultimately answered you.

> REPUTON "Application" =  Implicit intended app is "packet staining" as per
> draft-macaulay-6man-packet-stain-00

Yep; you'll need to re-state it in our model in case the reputation service operates in many reputation application spaces, to be sure you got a meaningful answer.

> REPUTON "Assertion" = no mapping

For repute, the absence of this means "IS-GOOD" (Section 3.1 of our media-type document).

> REPUTON "Confidence" = no mapping at this time.
> 
> REPUTON "Authenticity" = no mapping.  This is a management control
> (business-level decision) in our model.

These are optional anyway.

> Algorithmic functions as per
> draft-macaulay-6man-reputation-intelligence-00
> 
> Function 1:  to account for large Internet portals with many,
> independent URLs with good reputations, but also some proportion of dangerous (bad
> reputation) URLs sharing the same IP address = no REPUTON mapping?

If you have a source IP address producing several streams with distinct behaviours, but are unable to distinguish them, then the best you can do is deliver a reputon about the source IP address or the/a CIDR block containing it.

> Function 2:  to account for the distance in time between the last
> observed suspicious or illicit behaviour and the present Function 3:
> to account for the reputations of both adjacent IP addresses or domains
> = REPUTON "Updated" (approximately?)

Roughly.  "Updated" is meant to indicate "this is when this rating was calculated" so you can see how stale it is.  If that's not what you need, you can declare an extension.

> Function 4: to account for the original, per-processed source of the
> intelligence (open source, closed source, domain of control,
> uncontrolled
> domain) = no REPUTON mapping

I'm not sure what this is.  You might be able to relay this via the "RATER" parameter in the reply, or if that overloading is distasteful, you can simply declare an extension in your response set that indicates what the intelligence source is.

> Function 5:  to account for the velocity of suspicious or illicit
> behaviour (IE. high Spam rate) = no REPUTON mapping

This would be an extension, maybe "RATE-OF-CHANGE" or "VELOCITY" or something.

> Function 6:  to account for the duration of suspicious or illicit
> behaviour (IE. sustained spam at low velocity) = REPUTON "Updated"
> (approximately?)

If your rating scale is such that there's a threshold of interest (0.5, let's say), then you could have an extension to indicate how long the subject has been above that threshold.

> Function 7:  to account for lifetime of domain to source IP
> associations (IE. newly minted domain names or previously unobserved/
> un-assigned addresses = REPUTON "Updated" (approximately)

Extension, although now your subject is a complex one (i.e., something like IP:domain).

> Function 8:  to account for the proportion of traffic from this source
> which is benign versus demonstrably illicit = REPUTON "Sample size"
> (approximately?)

I would think that factors into the rating itself.

> Function 9:  to account of the nature of the suspicious or illicit
> behaviour (automated port scanning versus malware-drop) = no REPUTON
> mapping

This would be a set of assertions you define in your response set other than the default "IS-GOOD".

-MSK

[domainrep] Test application reg under repute-med… Tyson Macaulay
[domainrep] Reputation algorithms Tyson Macaulay
Re: [domainrep] Test application reg under repute… Murray S. Kucherawy
Re: [domainrep] Reputation algorithms Murray S. Kucherawy
Re: [domainrep] Reputation algorithms Murray S. Kucherawy