[Asrg] Development of an object assessment format/protocol

Rich Kulawiec <rsk@gsp.org> Mon, 04 March 2013 13:29 UTC

Date: Mon, 04 Mar 2013 08:29:24 -0500
From: Rich Kulawiec <rsk@gsp.org>
To: asrg@irtf.org
Message-ID: <20130304132924.GA27928@gsp.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: [Asrg] Development of an object assessment format/protocol
Precedence: list
Reply-To: Anti-Spam Research Group - IRTF <asrg@irtf.org>

I've been thinking about this for a long time, and would like to find
out what others have been doing in this area (if anything) and whether
this is a topic we can or should collectively pursue.

Here's the problem statement:

We've been using DNS to communicate information about the assessment
of certain objects -- IP addresses, host names, domain names -- and
while that has its advantages (notably that most of the software already
exists, is already installed, is reasonably well-understood, etc.) it
also limits the vocabulary we can use.  Moreover, there are objects
that we might want to talk about whose information isn't easily
communicated via DNS -- e.g., web pages, email addresses.  We use other
kinds of methods for communicating those, including downloadable files,
APIs, etc.

We need, I think, a mechanism via which we can ask more complex questions
and get more comprehensive answers.  We need a mechanism which isn't
a hack on top of DNS, but which has been developed from the ground up
specifically for this purpose.

At the moment, there are a number of ad hoc ways that this happens:
for example, Joe Wein maintains a rather large list of spammer/phisher
email addresses.  (And domains, too.)   The Malwaredomains folks have
lists of domains.  The Stopforumspam folks have lists of domains and
IP addresses.  There are DNSBLs and RHSBLs like the ones run by Spamhaus.
There are various projects to identify malicious web pages.  And so on.

And all of these are great, except: they all use different ways to
express information.  Some of them can be queried; some can't.  Some
of them carry metadata like "how did we decide this?" or 'when did
we decide this?" or "for further reference, see:" and some don't.
Some of them support methods for asking narrower/broader questions,
some of them don't.

What I'm suggesting, therefore, is that we need (a) a standardized
way to express these things and (b) a standardized protocol by which
we can ask questions and get answers.  For instance:

	Does the web page at http://example.com/foo.html contain malware?

	Is the address fred@example.net associated which phishing?

	What can you tell me about the domain example.com?

	Has the IP address 192.168.0.20 sent spam recently?

Certainly all of these things are possible today, by asking various
information sources in various ways.  But not in an integrated,
unified fashion which would yield results that could be compared
to each other or integrated with each other programatically.

(For example, I might wish to ask 5 different information sources
about 192.168.0.20 and weight their opinions.  Or I might want
to ask an open-ended question like "what do you know about example.com?")

In all these instances, opinions come with metadata: whose opinion
is this?  At what time was it rendered?  Is there are time at which
it should be considered no-longer-valid?  Is there a confidence
level associated with this opinion?  Is the answer specific to
the object that was asked about or does it apply more broadly?
(e.g., I asked about 192.168.0.20 but got back an opinion that
applies not only to that, but to all of 192.168.0.0/24.)

Where I'm going, probably predictably, is that the format for
both questions and answers may be XML-based in order to provide
sufficient expressive power.   (Yes, that's verbose.  Very much
the antithesis of the terse Q/A format we use with DNS.  I haven't
been able to decide if that's a good, bad or neutral thing,
other than noting that using XML has the advantage of making
information immediately palatable to a wide range of software.)

So let me see if I can phrase the questions this way:

1. Is such a format needed?
2. Is a query-response protocol needed to transmit it?
3. If so, does anything already exist which would lend itself
to (1) and (2) with minimal changes?  If so, is it desirable
to run that experiment?
4. If not, then is there sufficient utility in this approach
that it's worth pursuing?
5. If this exists, will it be used?  Is there sufficient reason
for changes from what already exists?

(I'm aware of draft-dskoll-reputation-reporting, but it doesn't
cover all kinds of objects I have in mind here.  I'll note in
passing though that it attempts to be as concise as possible,
which is a good thing and a fine thing, but does limit the
scope of both questions and answers.)

---rsk

[Asrg] Development of an object assessment format… Rich Kulawiec
Re: [Asrg] Development of an object assessment fo… Martijn Grooten
Re: [Asrg] Development of an object assessment fo… Emanuele Balla (aka Skull)
Re: [Asrg] Development of an object assessment fo… Dave Crocker
Re: [Asrg] Development of an object assessment fo… Rich Kulawiec
Re: [Asrg] Development of an object assessment fo… Martijn Grooten
Re: [Asrg] Development of an object assessment fo… Paul Smith
Re: [Asrg] Development of an object assessment fo… Barry Shein
Re: [Asrg] Development of an object assessment fo… Emanuele Balla (aka Skull)
Re: [Asrg] Development of an object assessment fo… John Levine