Re: [weirds] Entity name searches

Dave Piscitello <dave.piscitello@icann.org> Tue, 10 November 2015 02:51 UTC

Return-Path: <dave.piscitello@icann.org>
X-Original-To: weirds@ietfa.amsl.com
Delivered-To: weirds@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A9DAD1A7032 for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 18:51:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.431
X-Spam-Level:
X-Spam-Status: No, score=-3.431 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_NEUTRAL=0.779, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id spQXscxfvqkR for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 18:51:44 -0800 (PST)
Received: from out.west.pexch112.icann.org (pfe112-ca-1.pexch112.icann.org [64.78.40.7]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F6B91A7022 for <weirds@ietf.org>; Mon, 9 Nov 2015 18:51:44 -0800 (PST)
Received: from PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) by PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Mon, 9 Nov 2015 18:51:42 -0800
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([64.78.40.21]) by PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([64.78.40.21]) with mapi id 15.00.1044.021; Mon, 9 Nov 2015 18:51:42 -0800
From: Dave Piscitello <dave.piscitello@icann.org>
To: Andrew Newton <andy@hxr.us>
Thread-Topic: [weirds] Entity name searches
Thread-Index: AQHRGNV/4iTet5vJAke73jAZnpXocZ6QDfAAgARqPACAAAj1gIAALtUAgAADnoCAABmqAIAATJwA
Date: Tue, 10 Nov 2015 02:51:42 +0000
Message-ID: <30152CCF-2CB0-45A1-9D3B-7A8A0F7706C2@icann.org>
References: <CALRmJyi6i-fo12M0M-gx9Ds50P+0esmX1cRkaFwW_2_L=xcNvg@mail.gmail.com> <CAAQiQRcUvDBLhDEa0ssqmoTPJ18673RiU60mSx7CuxvMq+203w@mail.gmail.com> <CALRmJyixwJZ+pGRKjg6bSpbZEYkqqO3=-Zc46cBzfj8n6Uvv+w@mail.gmail.com> <DECCCEF3-9870-4599-BFD7-03200F20CEA3@icann.org> <CAAQiQRf3tZ7Vt2Xu79a6vb+hr9=FxsJRkHKOFP8SKaV_C1zJ-Q@mail.gmail.com> <29946A12-95CF-451E-8B7E-48A7EA51D62F@icann.org> <CAAQiQRca1jghwg1cGKVzjoUx1nRD1KGhWhzN9R0m9Zmb00jYNw@mail.gmail.com>
In-Reply-To: <CAAQiQRca1jghwg1cGKVzjoUx1nRD1KGhWhzN9R0m9Zmb00jYNw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-pgp-agent: GPGMail 2.5.1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.236]
Content-Type: multipart/signed; boundary="Apple-Mail=_74A5D795-1829-46F4-965F-2783B7E4F2B2"; protocol="application/pgp-signature"; micalg=pgp-sha512
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/weirds/tWuqRARY5PpdYOY98QF6rgFdFr4>
Cc: Brian Mountford <mountford@google.com>, Justine Tunney <jart@google.com>, "weirds@ietf.org" <weirds@ietf.org>
Subject: Re: [weirds] Entity name searches
X-BeenThere: weirds@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "WHOIS-based Extensible Internet Registration Data Service \(WEIRDS\)" <weirds.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/weirds>, <mailto:weirds-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/weirds/>
List-Post: <mailto:weirds@ietf.org>
List-Help: <mailto:weirds-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/weirds>, <mailto:weirds-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 02:51:46 -0000

> On Nov 9, 2015, at 5:17 PM, Andrew Newton <andy@hxr.us> wrote:
> 
> Yes, that is certainly doable but what is your initial query? "Give me
> all your data", even if it is 1000 results at a time, is probably not
> a good starting point.

I think we’re losing the thread. I was offering an example to illustrate how Whois “search" is used in security/operations today, as a counterpoint to what Brian was asking for (a search on data other than the domain).

Brian asked the original question so he’s best suited to answer “what is your initial query”.

> 
> -andy
> 
> On Mon, Nov 9, 2015 at 3:45 PM, Dave Piscitello
> <dave.piscitello@icann.org> wrote:
>> Can we clear up one question before we venture into extensions?
>> 
>> Q: What is the purpose of such an extension versus the ability that is naturally available if one is able to collect structured data into a local data store?
>> 
>> For my example, so long as I’m able to gather the 100,000 structured records, I and others will happily take structured data, incorporate these into SQL or other databases, and search to our hearts’ content. What has inhibited the kinds of _domain name_ investigations my example illustrates in the past was the combination of non-standard, unstructured data, rate limiting, and other completely non-RDAP related matters.
>> 
>> Perhaps I’m missing something, but the extension(s) required to do what I suggested on a query level would require federated {authentication,authorization,auditing… } that would be quite remarkable objective for the numbers world and a monumental undertaking for the domain world, no?
>> 
>>> On Nov 9, 2015, at 3:32 PM, Andrew Newton <andy@hxr.us> wrote:
>>> 
>>> Ok. I understand the use case. It still requires an extension to RDAP.
>>> That type of search never came up during the standardization process.
>>> 
>>> At ARIN we do offer more targeted RESTful searches, but it has been my
>>> experience they are used almost exclusively for data mining. And I
>>> suspect that if left wide open, that's how it would be used in RDAP
>>> for DNRs. So to get a "system" to work, there would need to be an
>>> authorization system... and since we are talking multiple DNRs, that
>>> would be federated authorization. And because there are multiple DNRs,
>>> these queries would have to be run in parallel against each one (using
>>> the bootstrap to find them). And then you have the problem that some
>>> domain registries are thin. Those are a lot of problems to solve, and
>>> I'm not sure who has the will to do it.
>>> 
>>> -andy
>>> 
>>> On Mon, Nov 9, 2015 at 12:44 PM, Dave Piscitello
>>> <dave.piscitello@icann.org> wrote:
>>>> More generally, if you are looking at 100,000 registration records that have
>>>> been associated with spam, you’d want the ability to search or pivot on any
>>>> data/string that establish relationships among the domain strings (e.g.,
>>>> botnet DGA) but as importantly, _any_ element of the registration record
>>>> that subsets share: POC, creation date, name server…
>>>> 
>>>> On Nov 9, 2015, at 12:12 PM, Brian Mountford <mountford@google.com> wrote:
>>>> 
>>>> Well, for instance, one might want to search the organization. When I do a
>>>> WHOIS query for google.com, the contacts have an organization of Google Inc.
>>>> I might want to search for all contacts with that organization.
>>>> 
>>>> I can try to tailor the interpretation, but since the search string syntax
>>>> does not allow for arbitrary suffix searching, it's not clear what tailoring
>>>> is possible. Are you saying that I could take an entity name search string
>>>> of "McB*" and interpret that as a search for names any of whose words begin
>>>> with McB, so that it would find Joe McBride as well as McBurns Simpson? That
>>>> seems to be playing pretty loose with the partial search string rules, since
>>>> the RFC takes pains to define that syntax so precisely.
>>>> 
>>>> Brian
>>>> 
>>>> On Fri, Nov 6, 2015 at 4:47 PM, Andrew Newton <andy@hxr.us> wrote:
>>>>> 
>>>>> Welcome to the world of internationalization, where the concept of a
>>>>> last name vs a first name is not universal, and US ASCII is not
>>>>> representative all the characters used.
>>>>> 
>>>>> Since you know the data in your database best, you should taylor the
>>>>> interpretation of the query input to that which works best with your
>>>>> registry.
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:54 AM, Brian Mountford <mountford@google.com>
>>>>> wrote:
>>>>>> 
>>>>>> And only names? There is no provision for searching entities by address,
>>>>>> etc.?
>>>>>> 
>>>>> 
>>>>> I don't think that ever came up. It would require an RDAP extension. I
>>>>> do question how useful such a thing would be. Why does anybody care
>>>>> that a particular registry has contacts living on Mumford Lane in East
>>>>> Westover? Are they searching all the registries for that information?
>>>>> Do they need a telephone book instead?
>>>>> 
>>>>> -andy
>>>> 
>>>> 
>>>> _______________________________________________
>>>> weirds mailing list
>>>> weirds@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/weirds
>>>> 
>>>> 
>>