Re: [weirds] Entity name searches

Andrew Newton <andy@hxr.us> Mon, 09 November 2015 22:17 UTC

Return-Path: <andy@hxr.us>
X-Original-To: weirds@ietfa.amsl.com
Delivered-To: weirds@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AEC5A1B843B for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 14:17:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.278
X-Spam-Level:
X-Spam-Status: No, score=-1.278 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ga2cfcaNn1el for <weirds@ietfa.amsl.com>; Mon, 9 Nov 2015 14:17:22 -0800 (PST)
Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com [IPv6:2a00:1450:400c:c09::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E7EBB1B85F0 for <weirds@ietf.org>; Mon, 9 Nov 2015 14:17:21 -0800 (PST)
Received: by wmec201 with SMTP id c201so104533235wme.0 for <weirds@ietf.org>; Mon, 09 Nov 2015 14:17:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hxr_us.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=UYrrw1LntcoGLD9pXbUI5p+zbTS2Ol4tK4SKWVc7fjI=; b=ZaSmJa5b0MeR4sN1IcO0nKRumR0Z1+F5L7sYaL+SEZ9+EfL8yW6Lg2oCNfywOkch4a /XhamCYIuiEUqdbZfzUSWnTjOasQGgTW7eIeHZqfbGIKs0DpuoyOtdxxvxRdpvBiyuk0 pAPPqywjiTiI4zPiQXcX5wgYpjfyV+eKsJjF93dBp6gBZusO9uO6z3+78Z3pPyIKLytc RDsWfQ2Kgz1Pek00p3AAhsXDUOuvS2VsmWLQKVbHT+tMlI0yfuOa4SXtk9/dLhcFH5DF 7RPgUTjgkiBxLTWVQmHQ2dENYS0uEs37HhsRfa87VSy7oO0FTQfQbS36EHxx3i6ixBZh v7UA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=UYrrw1LntcoGLD9pXbUI5p+zbTS2Ol4tK4SKWVc7fjI=; b=hxYvRz2slO0B+3mbkqE45l0/qkHy3VW7sAw+2A+dKuwsRTxAmtzj2/smMkJfCY4CWb gcXLcMe7tJDKQLAKxSNEjS4/dgXibrGxRaOchAAG+052r8tQQ0eGlUMd7NuU0J8Klrn0 U3vdHHjaqE+uXDahB7O5tZhDFC+PiA34Hn2xIQTvjGQilIPwrLkfNrw/7klrXNDWCOdV 6HX9elVFxhLlIiSd1egdwFCJhG7h+8bBRQUyVdCRXVC25lfFI+l1tQWQqiJ2aLzSsCRo +P33QyBXXkXHh9Grr79t2XN8TsengF7MLAVDj/7i+LH+242NCgijH05k9JPQE1PzHFBC 6Gnw==
X-Gm-Message-State: ALoCoQlxmjkmi4zpyXS2+R5p745rE5gry+Fl6Thb/MzGFcBov3Z52psfUbemmCgH1PhNtp2sqORq
MIME-Version: 1.0
X-Received: by 10.28.229.212 with SMTP id c203mr26407257wmh.11.1447107440484; Mon, 09 Nov 2015 14:17:20 -0800 (PST)
Received: by 10.194.91.134 with HTTP; Mon, 9 Nov 2015 14:17:20 -0800 (PST)
X-Originating-IP: [2001:500:4:15:f24d:a2ff:fe31:a268]
In-Reply-To: <29946A12-95CF-451E-8B7E-48A7EA51D62F@icann.org>
References: <CALRmJyi6i-fo12M0M-gx9Ds50P+0esmX1cRkaFwW_2_L=xcNvg@mail.gmail.com> <CAAQiQRcUvDBLhDEa0ssqmoTPJ18673RiU60mSx7CuxvMq+203w@mail.gmail.com> <CALRmJyixwJZ+pGRKjg6bSpbZEYkqqO3=-Zc46cBzfj8n6Uvv+w@mail.gmail.com> <DECCCEF3-9870-4599-BFD7-03200F20CEA3@icann.org> <CAAQiQRf3tZ7Vt2Xu79a6vb+hr9=FxsJRkHKOFP8SKaV_C1zJ-Q@mail.gmail.com> <29946A12-95CF-451E-8B7E-48A7EA51D62F@icann.org>
Date: Mon, 09 Nov 2015 17:17:20 -0500
Message-ID: <CAAQiQRca1jghwg1cGKVzjoUx1nRD1KGhWhzN9R0m9Zmb00jYNw@mail.gmail.com>
From: Andrew Newton <andy@hxr.us>
To: Dave Piscitello <dave.piscitello@icann.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/weirds/TmV088OvIqU2Tz6G9HKrd03A8sw>
Cc: Brian Mountford <mountford@google.com>, Justine Tunney <jart@google.com>, "weirds@ietf.org" <weirds@ietf.org>
Subject: Re: [weirds] Entity name searches
X-BeenThere: weirds@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "WHOIS-based Extensible Internet Registration Data Service \(WEIRDS\)" <weirds.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/weirds>, <mailto:weirds-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/weirds/>
List-Post: <mailto:weirds@ietf.org>
List-Help: <mailto:weirds-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/weirds>, <mailto:weirds-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Nov 2015 22:17:23 -0000

Yes, that is certainly doable but what is your initial query? "Give me
all your data", even if it is 1000 results at a time, is probably not
a good starting point.

-andy

On Mon, Nov 9, 2015 at 3:45 PM, Dave Piscitello
<dave.piscitello@icann.org> wrote:
> Can we clear up one question before we venture into extensions?
>
> Q: What is the purpose of such an extension versus the ability that is naturally available if one is able to collect structured data into a local data store?
>
> For my example, so long as I’m able to gather the 100,000 structured records, I and others will happily take structured data, incorporate these into SQL or other databases, and search to our hearts’ content. What has inhibited the kinds of _domain name_ investigations my example illustrates in the past was the combination of non-standard, unstructured data, rate limiting, and other completely non-RDAP related matters.
>
> Perhaps I’m missing something, but the extension(s) required to do what I suggested on a query level would require federated {authentication,authorization,auditing… } that would be quite remarkable objective for the numbers world and a monumental undertaking for the domain world, no?
>
>> On Nov 9, 2015, at 3:32 PM, Andrew Newton <andy@hxr.us> wrote:
>>
>> Ok. I understand the use case. It still requires an extension to RDAP.
>> That type of search never came up during the standardization process.
>>
>> At ARIN we do offer more targeted RESTful searches, but it has been my
>> experience they are used almost exclusively for data mining. And I
>> suspect that if left wide open, that's how it would be used in RDAP
>> for DNRs. So to get a "system" to work, there would need to be an
>> authorization system... and since we are talking multiple DNRs, that
>> would be federated authorization. And because there are multiple DNRs,
>> these queries would have to be run in parallel against each one (using
>> the bootstrap to find them). And then you have the problem that some
>> domain registries are thin. Those are a lot of problems to solve, and
>> I'm not sure who has the will to do it.
>>
>> -andy
>>
>> On Mon, Nov 9, 2015 at 12:44 PM, Dave Piscitello
>> <dave.piscitello@icann.org> wrote:
>>> More generally, if you are looking at 100,000 registration records that have
>>> been associated with spam, you’d want the ability to search or pivot on any
>>> data/string that establish relationships among the domain strings (e.g.,
>>> botnet DGA) but as importantly, _any_ element of the registration record
>>> that subsets share: POC, creation date, name server…
>>>
>>> On Nov 9, 2015, at 12:12 PM, Brian Mountford <mountford@google.com> wrote:
>>>
>>> Well, for instance, one might want to search the organization. When I do a
>>> WHOIS query for google.com, the contacts have an organization of Google Inc.
>>> I might want to search for all contacts with that organization.
>>>
>>> I can try to tailor the interpretation, but since the search string syntax
>>> does not allow for arbitrary suffix searching, it's not clear what tailoring
>>> is possible. Are you saying that I could take an entity name search string
>>> of "McB*" and interpret that as a search for names any of whose words begin
>>> with McB, so that it would find Joe McBride as well as McBurns Simpson? That
>>> seems to be playing pretty loose with the partial search string rules, since
>>> the RFC takes pains to define that syntax so precisely.
>>>
>>> Brian
>>>
>>> On Fri, Nov 6, 2015 at 4:47 PM, Andrew Newton <andy@hxr.us> wrote:
>>>>
>>>> Welcome to the world of internationalization, where the concept of a
>>>> last name vs a first name is not universal, and US ASCII is not
>>>> representative all the characters used.
>>>>
>>>> Since you know the data in your database best, you should taylor the
>>>> interpretation of the query input to that which works best with your
>>>> registry.
>>>>
>>>> On Sat, Nov 7, 2015 at 5:54 AM, Brian Mountford <mountford@google.com>
>>>> wrote:
>>>>>
>>>>> And only names? There is no provision for searching entities by address,
>>>>> etc.?
>>>>>
>>>>
>>>> I don't think that ever came up. It would require an RDAP extension. I
>>>> do question how useful such a thing would be. Why does anybody care
>>>> that a particular registry has contacts living on Mumford Lane in East
>>>> Westover? Are they searching all the registries for that information?
>>>> Do they need a telephone book instead?
>>>>
>>>> -andy
>>>
>>>
>>> _______________________________________________
>>> weirds mailing list
>>> weirds@ietf.org
>>> https://www.ietf.org/mailman/listinfo/weirds
>>>
>>>
>