[http-state] publicsuffix.org/list/#specification (aka list format, definitions, algorithm)

=JeffH <Jeff.Hodges@KingsMountain.com> Mon, 10 November 2014 00:25 UTC

Return-Path: <Jeff.Hodges@kingsmountain.com>
X-Original-To: http-state@ietfa.amsl.com
Delivered-To: http-state@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 18A241A87CA for <http-state@ietfa.amsl.com>; Sun, 9 Nov 2014 16:25:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.033
X-Spam-Level: *
X-Spam-Status: No, score=1.033 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H2K99AUHcySd for <http-state@ietfa.amsl.com>; Sun, 9 Nov 2014 16:25:43 -0800 (PST)
Received: from gproxy2-pub.mail.unifiedlayer.com (gproxy2-pub.mail.unifiedlayer.com [69.89.18.3]) by ietfa.amsl.com (Postfix) with SMTP id 8B0561A87C8 for <http-state@ietf.org>; Sun, 9 Nov 2014 16:25:43 -0800 (PST)
Received: (qmail 13965 invoked by uid 0); 10 Nov 2014 00:25:39 -0000
Received: from unknown (HELO CMOut01) (10.0.90.82) by gproxy2.mail.unifiedlayer.com with SMTP; 10 Nov 2014 00:25:39 -0000
Received: from box514.bluehost.com ([74.220.219.114]) by CMOut01 with id DcRa1p0052UhLwi01cRdSR; Sun, 09 Nov 2014 17:25:37 -0700
X-Authority-Analysis: v=2.1 cv=F5TEKMRN c=1 sm=1 tr=0 a=9W6Fsu4pMcyimqnCr1W0/w==:117 a=9W6Fsu4pMcyimqnCr1W0/w==:17 a=cNaOj0WVAAAA:8 a=f5113yIGAAAA:8 a=xk8Vn6ZJdw4A:10 a=IkcTkHD0fZMA:10 a=W0ucIhDPAAAA:8 a=ieNpE_y6AAAA:8 a=XYUc-DgfXtMA:10 a=Fwsyk3WOAnQA:10 a=P5wrnlEIAAAA:8 a=NAVyyc_QdcApkYh2tXoA:9 a=QEXdDO2ut3YA:10 a=H5ALQNfyHMkA:10 a=K6OZBpTtl3cA:10
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=kingsmountain.com; s=default; h=Content-Transfer-Encoding:Content-Type:Subject:To:MIME-Version:From:Date:Message-ID; bh=qQmu1szeX8KuGcMVJVYo4EPiowi6+WQKKo5tsv09obU=; b=XSnbgy41iW4lTo0FYrnrGCHQqe64sZ2gojmCDyWW64WeHgMxU/sk3csaEjCx7x8a/ve1Q90/QYWRl0oURGNxsTsIYaVnkYrl/7/xj50yudw70vedwhVQIrKKY4fw53d4;
Received: from [24.5.2.144] (port=49512 helo=[192.168.11.19]) by box514.bluehost.com with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.82) (envelope-from <Jeff.Hodges@KingsMountain.com>) id 1XncnX-00030T-0e for http-state@ietf.org; Sun, 09 Nov 2014 17:25:35 -0700
Message-ID: <54600624.5070501@KingsMountain.com>
Date: Sun, 09 Nov 2014 16:26:12 -0800
From: =JeffH <Jeff.Hodges@KingsMountain.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: HTTP State <http-state@ietf.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Identified-User: {11025:box514.bluehost.com:kingsmou:kingsmountain.com} {sentby:smtp auth 24.5.2.144 authed with jeff.hodges+kingsmountain.com}
Archived-At: http://mailarchive.ietf.org/arch/msg/http-state/vCLaOXByqMLBYbfDTacjQ0DEbiU
Subject: [http-state] publicsuffix.org/list/#specification (aka list format, definitions, algorithm)
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state/>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Nov 2014 00:25:47 -0000

I am extracting and reproducing here, the list format, definitions, 
algorithm description from publicsuffix.org, for reference/convenience...


List format

A public suffix is a set of DNS names or wildcards concatenated with dots. 
It represents the part of a domain name which is not under the control of 
the individual registrant.
Specification

     The list is a set of rules, with one rule per line.

     Each line is only read up to the first whitespace; entire lines can 
also be commented using //.

     Each line which is not entirely whitespace or begins with a comment 
contains a rule.

     Each rule lists a public suffix, with the subdomain portions separated 
by dots (.) as usual. There is no leading dot.

     The wildcard character * (asterisk) matches any valid sequence of 
characters in a hostname part. (Note: the list uses Unicode, not Punycode 
forms, and is encoded using UTF-8.)

     Wildcards may only be used to wildcard an entire level. That is, they 
must be surrounded by dots (or implicit dots, at the beginning of a line).

     If a hostname matches more than one rule in the file, the longest 
matching rule (the one with the most levels) will be used.

     An exclamation mark (!) at the start of a rule marks an exception to a 
previous wildcard rule. An exception rule takes priority over any other 
matching rule.


Example

Here is an example (incomplete) list section. The rules are numbered, but 
the numbers would not appear in the real file:

             1. com

             2. *.jp
             // Hosts in .hokkaido.jp can't set cookies below level 4...
             3. *.hokkaido.jp
             4. *.tokyo.jp
             // ...except hosts in pref.hokkaido.jp, which can set cookies 
at level 3.
             5. !pref.hokkaido.jp
             6. !metro.tokyo.jp


The example above would be interpreted as follows, in the case of 
cookie-setting, and using "foo" and "bar" as generic hostnames:

     Cookies may be set for foo.com.
     Cookies may be set for foo.bar.jp.
     Cookies may not be set for bar.jp.
     Cookies may be set for foo.bar.hokkaido.jp.
     Cookies may not be set for bar.hokkaido.jp.
     Cookies may be set for foo.bar.tokyo.jp.
     Cookies may not be set for bar.tokyo.jp.
     Cookies may be set for pref.hokkaido.jp because the exception overrides 
the previous rule.
     Cookies may be set for metro.tokyo.jp, because the exception overrides 
the previous rule.

Formal algorithm

Here is an algorithm for determining the Public Suffix of a domain. (Note: 
it may not be the most efficient algorithm.) The domain and all rules must 
be canonicalized in the normal way for hostnames - lower-case, Punycode (RFC 
3492).

Definitions

     The Public Suffix List consists of a series of lines, separated by \n.

     Each line is only read up to the first whitespace; entire lines can 
also be commented using //.

     Each line which is not entirely whitespace or begins with a comment 
contains a rule.

     A rule may begin with a "!" (exclamation mark). If it does, it is 
labelled as a "exception rule" and then treated as if the exclamation mark 
is not present.

     A domain or rule can be split into a list of labels using the separator 
"." (dot). The separator is not part of any of the labels.

     A domain is said to match a rule if, when the domain and rule are both 
split, and one compares the labels from the rule to the labels from the 
domain, beginning at the right hand end, one finds that for every pair 
either they are identical, or that the label from the rule is "*" (star). 
The domain may legitimately have labels remaining at the end of this 
matching process.

Algorithm

     Match domain against all rules and take note of the matching ones.

     If no rules match, the prevailing rule is "*".

     If more than one rule matches, the prevailing rule is the one which is 
an exception rule.

     If there is no matching exception rule, the prevailing rule is the one 
with the most labels.

     If the prevailing rule is a exception rule, modify it by removing the 
leftmost label.

     The public suffix is the set of labels from the domain which directly 
match the labels of the prevailing rule (joined by dots).

     The registered or registrable domain is the public suffix plus one 
additional label.