Re: [weirds] Entity name searches

Dave Piscitello <dave.piscitello@icann.org> Mon, 09 November 2015 20:45 UTC

Return-Path: <dave.piscitello@icann.org>
X-Original-To: weirds@ietfa.amsl.com
Delivered-To: weirds@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 564F41B8404 for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 12:45:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.431
X-Spam-Level:
X-Spam-Status: No, score=-3.431 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_NEUTRAL=0.779, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hSuZr5oIc2Ey for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 12:45:41 -0800 (PST)
Received: from out.west.pexch112.icann.org (pfe112-ca-2.pexch112.icann.org [64.78.40.10]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 68CD51B83FE for <weirds@ietf.org>; Mon, 9 Nov 2015 12:45:40 -0800 (PST)
Received: from PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) by PMBX112-W1-CA-2.pexch112.icann.org (64.78.40.23) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Mon, 9 Nov 2015 12:45:38 -0800
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([64.78.40.21]) by PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([64.78.40.21]) with mapi id 15.00.1044.021; Mon, 9 Nov 2015 12:45:38 -0800
From: Dave Piscitello <dave.piscitello@icann.org>
To: Andrew Newton <andy@hxr.us>
Thread-Topic: [weirds] Entity name searches
Thread-Index: AQHRGNV/4iTet5vJAke73jAZnpXocZ6QDfAAgARqPACAAAj1gIAALtUAgAADnoA=
Date: Mon, 9 Nov 2015 20:45:37 +0000
Message-ID: <29946A12-95CF-451E-8B7E-48A7EA51D62F@icann.org>
References: <CALRmJyi6i-fo12M0M-gx9Ds50P+0esmX1cRkaFwW_2_L=xcNvg@mail.gmail.com> <CAAQiQRcUvDBLhDEa0ssqmoTPJ18673RiU60mSx7CuxvMq+203w@mail.gmail.com> <CALRmJyixwJZ+pGRKjg6bSpbZEYkqqO3=-Zc46cBzfj8n6Uvv+w@mail.gmail.com> <DECCCEF3-9870-4599-BFD7-03200F20CEA3@icann.org> <CAAQiQRf3tZ7Vt2Xu79a6vb+hr9=FxsJRkHKOFP8SKaV_C1zJ-Q@mail.gmail.com>
In-Reply-To: <CAAQiQRf3tZ7Vt2Xu79a6vb+hr9=FxsJRkHKOFP8SKaV_C1zJ-Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-pgp-agent: GPGMail 2.5.1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.236]
Content-Type: multipart/signed; boundary="Apple-Mail=_FCB7ED55-2FEA-42BD-B418-A7C1A500C7D6"; protocol="application/pgp-signature"; micalg=pgp-sha512
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/weirds/hHyLnLhPI34YdOcXnFvtY6FxuJE>
Cc: Brian Mountford <mountford@google.com>, Justine Tunney <jart@google.com>, "weirds@ietf.org" <weirds@ietf.org>
Subject: Re: [weirds] Entity name searches
X-BeenThere: weirds@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "WHOIS-based Extensible Internet Registration Data Service \(WEIRDS\)" <weirds.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/weirds>, <mailto:weirds-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/weirds/>
List-Post: <mailto:weirds@ietf.org>
List-Help: <mailto:weirds-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/weirds>, <mailto:weirds-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Nov 2015 20:45:46 -0000

Can we clear up one question before we venture into extensions?

Q: What is the purpose of such an extension versus the ability that is naturally available if one is able to collect structured data into a local data store?

For my example, so long as I’m able to gather the 100,000 structured records, I and others will happily take structured data, incorporate these into SQL or other databases, and search to our hearts’ content. What has inhibited the kinds of _domain name_ investigations my example illustrates in the past was the combination of non-standard, unstructured data, rate limiting, and other completely non-RDAP related matters.

Perhaps I’m missing something, but the extension(s) required to do what I suggested on a query level would require federated {authentication,authorization,auditing… } that would be quite remarkable objective for the numbers world and a monumental undertaking for the domain world, no?

> On Nov 9, 2015, at 3:32 PM, Andrew Newton <andy@hxr.us> wrote:
> 
> Ok. I understand the use case. It still requires an extension to RDAP.
> That type of search never came up during the standardization process.
> 
> At ARIN we do offer more targeted RESTful searches, but it has been my
> experience they are used almost exclusively for data mining. And I
> suspect that if left wide open, that's how it would be used in RDAP
> for DNRs. So to get a "system" to work, there would need to be an
> authorization system... and since we are talking multiple DNRs, that
> would be federated authorization. And because there are multiple DNRs,
> these queries would have to be run in parallel against each one (using
> the bootstrap to find them). And then you have the problem that some
> domain registries are thin. Those are a lot of problems to solve, and
> I'm not sure who has the will to do it.
> 
> -andy
> 
> On Mon, Nov 9, 2015 at 12:44 PM, Dave Piscitello
> <dave.piscitello@icann.org> wrote:
>> More generally, if you are looking at 100,000 registration records that have
>> been associated with spam, you’d want the ability to search or pivot on any
>> data/string that establish relationships among the domain strings (e.g.,
>> botnet DGA) but as importantly, _any_ element of the registration record
>> that subsets share: POC, creation date, name server…
>> 
>> On Nov 9, 2015, at 12:12 PM, Brian Mountford <mountford@google.com> wrote:
>> 
>> Well, for instance, one might want to search the organization. When I do a
>> WHOIS query for google.com, the contacts have an organization of Google Inc.
>> I might want to search for all contacts with that organization.
>> 
>> I can try to tailor the interpretation, but since the search string syntax
>> does not allow for arbitrary suffix searching, it's not clear what tailoring
>> is possible. Are you saying that I could take an entity name search string
>> of "McB*" and interpret that as a search for names any of whose words begin
>> with McB, so that it would find Joe McBride as well as McBurns Simpson? That
>> seems to be playing pretty loose with the partial search string rules, since
>> the RFC takes pains to define that syntax so precisely.
>> 
>> Brian
>> 
>> On Fri, Nov 6, 2015 at 4:47 PM, Andrew Newton <andy@hxr.us> wrote:
>>> 
>>> Welcome to the world of internationalization, where the concept of a
>>> last name vs a first name is not universal, and US ASCII is not
>>> representative all the characters used.
>>> 
>>> Since you know the data in your database best, you should taylor the
>>> interpretation of the query input to that which works best with your
>>> registry.
>>> 
>>> On Sat, Nov 7, 2015 at 5:54 AM, Brian Mountford <mountford@google.com>
>>> wrote:
>>>> 
>>>> And only names? There is no provision for searching entities by address,
>>>> etc.?
>>>> 
>>> 
>>> I don't think that ever came up. It would require an RDAP extension. I
>>> do question how useful such a thing would be. Why does anybody care
>>> that a particular registry has contacts living on Mumford Lane in East
>>> Westover? Are they searching all the registries for that information?
>>> Do they need a telephone book instead?
>>> 
>>> -andy
>> 
>> 
>> _______________________________________________
>> weirds mailing list
>> weirds@ietf.org
>> https://www.ietf.org/mailman/listinfo/weirds
>> 
>>