Re: [alto] A unified approach to value schemas and ALTO maps

Gao Kai <gaok12@mails.tsinghua.edu.cn> Tue, 02 August 2016 03:21 UTC

Return-Path: <gaok12@mails.tsinghua.edu.cn>
X-Original-To: alto@ietfa.amsl.com
Delivered-To: alto@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 901A412D788 for <alto@ietfa.amsl.com>; Mon, 1 Aug 2016 20:21:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.188
X-Spam-Level:
X-Spam-Status: No, score=-3.188 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OGxPqavEyHTO for <alto@ietfa.amsl.com>; Mon, 1 Aug 2016 20:20:59 -0700 (PDT)
Received: from tsinghua.edu.cn (smtp28.tsinghua.edu.cn [166.111.204.52]) by ietfa.amsl.com (Postfix) with ESMTP id 343BF12D784 for <alto@ietf.org>; Mon, 1 Aug 2016 20:20:57 -0700 (PDT)
Received: from [192.168.1.10] (unknown [218.88.95.76]) by app3 (Coremail) with SMTP id DMxvpgDHzweWEaBXgPgqAQ--.11486S3; Tue, 02 Aug 2016 11:20:56 +0800 (CST)
To: Wendy Roome <wendy.roome@nokia-bell-labs.com>, Jensen Zhang <jingxuan.n.zhang@gmail.com>
References: <D3BD183E.80D4FC%w.roome@alcatel-lucent.com> <CAAbpuyr8QemDbT+gvytkyWVJrnmK3XtocDfhkCwPi3bNFEQN3A@mail.gmail.com> <D3BE26A8.81086D%w.roome@alcatel-lucent.com>
From: Gao Kai <gaok12@mails.tsinghua.edu.cn>
X-Enigmail-Draft-Status: N1110
Message-ID: <57A01196.1020309@mails.tsinghua.edu.cn>
Date: Tue, 02 Aug 2016 11:20:54 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
MIME-Version: 1.0
In-Reply-To: <D3BE26A8.81086D%w.roome@alcatel-lucent.com>
Content-Type: multipart/alternative; boundary="------------020008050108000902010809"
X-CM-TRANSID: DMxvpgDHzweWEaBXgPgqAQ--.11486S3
X-Coremail-Antispam: 1UD129KBjvJXoW3AF4UtrWkCFWxZrW5XF4DCFg_yoWxKr4fpF WrtFykGa4DXa9Fkrn2vw10qFyrCrZ3WF45Jrn8GryDA3Z8G348trWrKa15ZF1kCr40qF12 qw4jgry3Zan8ZaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyCb7Iv0xC_Kw4lb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I2 0VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rw A2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_ Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s 0DM2vYz4IE04k24VAvwVAKI4IrM2AIxVAIcxkEcVAq07x20xvEncxIr21lYx0E2Ix0cI8I cVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjc xG0xvEwIxGrwCjr7xvwVCIw2I0I7xG6c02F41lc7I2V7IY0VAS07AlzVAYIcxG8wCY02Av z4vE14v_Gr4l42xK82IYc2Ij64vIr41lx2IqxVAqx4xG67AKxVWUGVWUWwC20s026x8Gjc xK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI7VAKI48JMIIF0xvE2Ix0 cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8V AvwI8IcIk0rVWrJr0_WFyUJwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF 7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x0UUjRCJUUUUU=
X-CM-SenderInfo: 5jdryi2s6ptxtovo32xlqjx3vdohv3gofq/
Archived-At: <https://mailarchive.ietf.org/arch/msg/alto/2l31OYpqzREMe2vBMApZmdhrR80>
Cc: IETF ALTO <alto@ietf.org>
Subject: Re: [alto] A unified approach to value schemas and ALTO maps
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/alto/>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Aug 2016 03:21:02 -0000

Wendy, Jensen, Vijay and all,

I'm also interested in this approach, and agree on most parts.  Please
see below.

On 27/07/16 21:38, Wendy Roome wrote:
> Jensen,
>
> Thanks for commenting!  My responses are in-line.
>
>     From: Jensen Zhang <jingxuan.n.zhang@gmail.com
>     <mailto:jingxuan.n.zhang@gmail.com>>
>     Date: Tue, July 26, 2016 at 21:18
>     Subject: Re: [alto] A unified approach to value schemas and ALTO maps
>
>     Hi Wendy,
>
>     I want to support this idea. Please see comments inline.
>
>     On Wed, Jul 27, 2016 at 1:38 AM, Wendy Roome
>     <wendy.roome@nokia-bell-labs.com
>     <mailto:wendy.roome@nokia-bell-labs.com>> wrote:
>
>         Recently we talked about using "value schemas" so that we can
>         add new
>         value types without defining new media types and message formats.
>
>         Here is another take on that approach. It unifies *everything*
>         into one
>         common message format. Extensions simply require defining new
>         entity
>         domains and (simple) value types.
>
>         So here's my idea. Every ALTO resource boils down to a mapping
>         from 1-,
>         2-, 3- or 4-tuples to values. The tuple elements are names of
>         entities in
>         a domain.
>
>         Examples:
>            Cost Map: (pid, pid)  =>  value
>            Endpoint Costs:  (addr, addr) =>  value
>            Endpoint Props:  (addr, prop-name)  =>  value
>            MultiCost Map:   (pid, pid, cost-type)  =>  value
>            MultiCost Calendar Map: (pid, pid, date-range, cost-type) 
>         =>  value
>            Network Map:  (cidr)  =>  pid
>
I think the tuples can be seen as the cross product of two parts: the
first key represents the target (endpoint pairs, flows, etc.), and the
second key represents the semantic of the data (metrics, time ranges,
confidence, deviation, etc.).  The format would be something like:

Cost map: (pid, pid) => value
Endpoint cost: (addr, addr) => value
Endpoint props: (addr) => value
...

Multicost: key => { (cost-type) => value }
Multicost calendar: key => { (cost-type, date-range) => value }
...

Thus, the two parts can be decoupled.  This may simplify the
specifications for extensions on only one aspect.
>
>         Note that I flipped the network map around. I think cidr =>
>         pid is cleaner
>         than pid => cidr-array, and it enforces the rule that a cidr
>         is in only
>         one pid.
>
>
>     Agree. And maybe another benefit from cidr => pid is for
>     incremental update. Once a cidr changes its pid, pid => cidr-array
>     will require an update of the whole cidr-array. But cidr => pid
>     only requires an update of one pid item.
>
>
> Exactly! 
>
>
>         With this approach, the meta section of a response would give
>         the domain
>         for each tuple element, and the value type. For example, here is a
>         conventional cost map:
>
>            meta:
>               tuple-domains: [pid, pid]
>               value-type:
>                  specification: rfc7285
>                  format: cost-type
>                  parameters:
>                      metric: routingcost, mode: numerical
>            map:
>               pid1:
>                  pid1: ###,  pid2: ###, ...
>
>         The value field gives document that specifies this value
>         format (or a
>         registered name), a format name that is defined in that
>         document, and any
>         additional parameters necessary to understand this value.
>
>
>     The specification metadata is very interesting and I like this
>     design. But I don't think RFC is the good solution for the
>     specification citation. RFC is human-readable, but it is hard for
>     programs to parse RFC documents. Maybe we find a schema language
>     (or JSON schema) to specify the value format.
>
>
> Okay, here is where I respectfully disagree. I do not believe a
> program-readable schema will help.
I think an ultimate goal of providing program-readable schema is that
whenever we propose a new type, we only define it with a unified schema
and all clients/servers would be able to understand its format, valid
values and if possible, the semantics.

There was a draft (from Lyle I think) proposing that servers may provide
on-demand metrics based on user queries.  To distribute the new metrics,
we might need such a schema to make sure the servers/clients have the
same understanding of the costs.

However, I don't think the schema should be included as part of the
query.  Instead, it can be put in a remote file and the first time an
ALTO library sees a new schema URL, it should download the file and
parse the schema.
>
> Why not? First, JSON is self-describing. An ALTO client does not need
> a schema to parse JSON. JSON libraries do that on their own. The JSON
> libraries I’ve used build a DOM from the JSON and allow client
> programs to explore that DOM.
>
> IMHO, formal schemas have two uses. One is to allow a program to
> verify that a given JSON tree matches the schema. However, if the
> server’s response does not match the schema, what can a client do
> about it, other than give up on that ALTO server?
>
> The other is to document the layout, and (maybe!) facilitate
> automatically creating a program, or at least a skeleton of a program,
> to program with that data. But no ALTO client is doing to do that
> on-the-fly. That happens off-line, when the programmer decides how the
> client will use an ALTO server.
>
> I believe the important thing is to specify the semantics. E.g., what
> do these values mean? For example, take ordinal vs numerical mode. You
> cannot capture the difference in a schema.
>
> As another example, consider duration values. Frequently durations
> have suffix with the units: 10s, 1.23ms, 15us, etc. Or they have
> colons, as in 1:23.4. A grammar can define the legal values, but
> unless the programmer knows "ms" means milliseconds, the grammar does
> not help.
>
> So I believe values will have to be described by some document, which
> may have a formal schema. The document will define some unique name to
> identify values of this kind, and ALTO clients will switch on that
> name. The name could be rfc####, or it could be yet another IANA
> registry, or whatever.
>
>      
>
>
>         Here is a multi-cost map:
>            meta:
>               tuple-domains: [pid, pid, cost-type-name]
>               value-types:
>                  3:
>
>
>     Is this a typo? Or what's the mean of '3:' here?
>
>
> The 3 means the value-type depends on the value of the third tuple
> element - the cost-type. But now that I look at it again, that is
> overly general. It is probably sufficient to say that either all
> values are the same type, or else the value type depends on the value
> of the last element in the tuple. E.g., cost-type, property-name, etc.
>
>      
>
>                     cost:
>                        specification: rfc7285
>                        format: cost-type
>                        parameters:
>                          metric: routingcost, mode: numerical
>                     delay:
>                        specification: rfc####
>                        format: duration
>                        parameters:
>                           metric: propagation-time
>            map:
>               pid1:
>                  pid2:
>                     cost: 123, delay: 2ms
>
>         Here the different cost types have different value types.
>         meta.value-types
>         means that the value depends on the 3d element of the tuple - the
>         cost-type-name - and gives the value format for each
>         cost-type-name.
>
>
>     It is also more flexible for incremental updates.
>      
>
>
>
>         So are people interested in pursuing this approach? The good
>         news is that
>         will unify everything. The bad news is it will replace
>         everything in rfc
>         7285.
>
>             - Wendy Roome
>
>
>     I find this approach is also compatible with FCS, since FCS
>     returns a fid => value map currently. I'd like to push it forward.
>
>     One more question: I think (src, dst) => cost map and (dst, src)
>     => cost map will have the same "tuple-domains" value, but they are
>     different for clients. How to differentiate them?
>
>
> Because the domains in the tuples are ordered. E.g., in a cost map,
> the first domain is always the source and the second is the destination.
>
>
>     Best,
>     Jensen
>
>
> Thanks! Let's keep the discussion going. Anyone else like to comment?
> Richard??
>
> - Wendy Roome
>
Regards,
Kai