Re: [Asrg] Development of an object assessment format/protocol

Rich Kulawiec <rsk@gsp.org> Mon, 04 March 2013 17:10 UTC

Date: Mon, 04 Mar 2013 12:10:10 -0500
From: Rich Kulawiec <rsk@gsp.org>
To: Anti-Spam Research Group - IRTF <asrg@irtf.org>
Message-ID: <20130304171010.GA3191@gsp.org>
References: <20130304132924.GA27928@gsp.org> <0D79787962F6AE4B84B2CC41FC957D0B20C05A58@abn-exch1b.green.sophos>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <0D79787962F6AE4B84B2CC41FC957D0B20C05A58@abn-exch1b.green.sophos>
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Re: [Asrg] Development of an object assessment format/protocol
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>

On Mon, Mar 04, 2013 at 03:46:14PM +0000, Martijn Grooten wrote:
> Is the reason different sources use different ways to express
> information the fact that there is no suitable protocol? Or is it a
> mere consequence of the fact that sources have different things they
> are willing and able to share?

That's a pair of great questions, and I can see reasons to answer "yes"
to both.

On the one hand: there's no standardized way to do this (beyond DNSBLs
and RHSBLs, which we've piggybacked on DNS).  On the other hand, you're
right, different people are making different statements about different
entities -- IP addresses, domains, web pages, email addresses, etc. --
so *if* there existed some standardized way to express this, it would
have to let them say the same things they're saying now...because otherwise
they'd probably have no reason to use it.

So I dunno.

> Perhaps you can come up with examples of where such a protocol would be useful?

Sure.  Let me show these using some pseudocode, just to illustrate the
concept.  Let's presume that example.org is asking questions of example.com.

	Question:
		query-proto-version = 1.0
		query-to = blah.example.com
		query-time = Mon Mar  4 16:06:37 UTC 2013
		object type = ipv4
		object value = 192.168.0.3
		object query = spam source?
	Answer:
		answer-proto-version = 1.1
		answer-from = blah.example.com
		answer-time = Mon Mar  4 16:06:38 UTC 2013
		answer-valid-time = Fri Mar  1 13:05:00 UTC 2013
		answer-expiration-time = Fri Mar  8 13:05:00 UTC 2013
		answer = yes

This is the equivalent of a DNSBL check -- except that the answer
also contains two more items.   It includes an "answer-valid-time",
which could be "the time that we started giving out this answer",
and "answer-expiration-time", which could be the time that this
answer is scheduled to expire.  Thus the former could mean "we listed
this IP address at 1:05 PM last Friday, because that's when our sensors
told us to" and the latter could mean "unless we see a reason to
extend the listing, we're going to drop it at 1:05 PM this Friday".

	Question:
		query-proto-version = 1.0
		query-to = blah.example.com
		query-time = Mon Mar  4 16:06:37 UTC 2013
		object type = URL
		object value = http://example.net/some/page.html
		object query = infected with malware?
	Answer:
		answer-proto-version = 1.1
		answer-from = blah.example.com
		answer-time = Mon Mar  4 16:06:38 UTC 2013
		answer-valid-time = Fri Mar  1 14:05:00 UTC 2013
		answer-expiration-time = Fri Mar  8 14:05:00 UTC 2013
		answer = no

This is a very similar Q/A: in this case the answer is negative,
but it also has an expiration time. (Let's presume that example.com
is crawling sites at weekly intervals, thus there is no reason for
this answer to [possibly] change until the next crawl is done.
The requestor might be okay with this answer, or it might want
a more recent one -- in which case it will need to ask someone else.)

	Question:
		query-proto-version = 1.0
		query-to = blah.example.com
		query-time = Mon Mar  4 16:06:37 UTC 2013
		object type = ASN
		object value = 123456789
		object query = hijacked?
	Answer:
		answer-proto-version = 1.1
		answer-from = blah.example.com
		answer-time = Mon Mar  4 16:06:38 UTC 2013
		answer-valid-time = Fri Mar  1 14:10:00 UTC 2013
		answer-expiration-time = Fri Apr  5 14:10:00 UTC 2013
		answer = yes
		answer-additional: http://example.com/hijacks/123456789

Also very similar.  I posited a much longer expiration time because
this is probably not going to be a quickly-remediated problem.  I've
also shown an addition to the answer, which in this case is just a URL
where something consumable by humans might be found.

To expand on those just a little bit: "object type" could probably
encompass things like:

	IPv4/IPv6 addresses
	networks (by handle?) (by CIDR?)
	ASNs
	domains, subdomains, hosts
	URLs
	email addresses

"object query" could include the examples above, and much more obviously,
but should exclude those things that we already have ways to find out,
e.g., this should not be a way to query for a DNS A record, because
that's just kinda silly.

There are two (at least two) ways to go with this: one would be to
make it concise and use UDP.  Another would be to make it verbose
and use TCP. (Insert long discussion here about performance tradeoffs.)
I'm not sure that this is worth getting into unless the high-level
idea flies: if we don't actually need a standard format and a standard
protocol that uses it, then those tradeoffs don't matter.

---rsk

[Asrg] Development of an object assessment format… Rich Kulawiec
Re: [Asrg] Development of an object assessment fo… Martijn Grooten
Re: [Asrg] Development of an object assessment fo… Emanuele Balla (aka Skull)
Re: [Asrg] Development of an object assessment fo… Dave Crocker
Re: [Asrg] Development of an object assessment fo… Rich Kulawiec
Re: [Asrg] Development of an object assessment fo… Martijn Grooten
Re: [Asrg] Development of an object assessment fo… Paul Smith
Re: [Asrg] Development of an object assessment fo… Barry Shein
Re: [Asrg] Development of an object assessment fo… Emanuele Balla (aka Skull)
Re: [Asrg] Development of an object assessment fo… John Levine