Re: [alto] A unified approach to value schemas and ALTO maps

Gao Kai <gaok12@mails.tsinghua.edu.cn> Tue, 02 August 2016 03:21 UTC

To: Wendy Roome <wendy.roome@nokia-bell-labs.com>, Jensen Zhang <jingxuan.n.zhang@gmail.com>
References: <D3BD183E.80D4FC%w.roome@alcatel-lucent.com> <CAAbpuyr8QemDbT+gvytkyWVJrnmK3XtocDfhkCwPi3bNFEQN3A@mail.gmail.com> <D3BE26A8.81086D%w.roome@alcatel-lucent.com>
From: Gao Kai <gaok12@mails.tsinghua.edu.cn>
Message-ID: <57A01196.1020309@mails.tsinghua.edu.cn>
Date: Tue, 02 Aug 2016 11:20:54 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
MIME-Version: 1.0
In-Reply-To: <D3BE26A8.81086D%w.roome@alcatel-lucent.com>
Content-Type: multipart/alternative; boundary="------------020008050108000902010809"
Archived-At: <https://mailarchive.ietf.org/arch/msg/alto/2l31OYpqzREMe2vBMApZmdhrR80>
Cc: IETF ALTO <alto@ietf.org>
Subject: Re: [alto] A unified approach to value schemas and ALTO maps
Precedence: list

Wendy, Jensen, Vijay and all,

I'm also interested in this approach, and agree on most parts.  Please
see below.

On 27/07/16 21:38, Wendy Roome wrote:
> Jensen,
>
> Thanks for commenting!  My responses are in-line.
>
>     From: Jensen Zhang <jingxuan.n.zhang@gmail.com
>     <mailto:jingxuan.n.zhang@gmail.com>>
>     Date: Tue, July 26, 2016 at 21:18
>     Subject: Re: [alto] A unified approach to value schemas and ALTO maps
>
>     Hi Wendy,
>
>     I want to support this idea. Please see comments inline.
>
>     On Wed, Jul 27, 2016 at 1:38 AM, Wendy Roome
>     <wendy.roome@nokia-bell-labs.com
>     <mailto:wendy.roome@nokia-bell-labs.com>> wrote:
>
>         Recently we talked about using "value schemas" so that we can
>         add new
>         value types without defining new media types and message formats.
>
>         Here is another take on that approach. It unifies *everything*
>         into one
>         common message format. Extensions simply require defining new
>         entity
>         domains and (simple) value types.
>
>         So here's my idea. Every ALTO resource boils down to a mapping
>         from 1-,
>         2-, 3- or 4-tuples to values. The tuple elements are names of
>         entities in
>         a domain.
>
>         Examples:
>            Cost Map: (pid, pid)  =>  value
>            Endpoint Costs:  (addr, addr) =>  value
>            Endpoint Props:  (addr, prop-name)  =>  value
>            MultiCost Map:   (pid, pid, cost-type)  =>  value
>            MultiCost Calendar Map: (pid, pid, date-range, cost-type) 
>         =>  value
>            Network Map:  (cidr)  =>  pid
>
I think the tuples can be seen as the cross product of two parts: the
first key represents the target (endpoint pairs, flows, etc.), and the
second key represents the semantic of the data (metrics, time ranges,
confidence, deviation, etc.).  The format would be something like:

Cost map: (pid, pid) => value
Endpoint cost: (addr, addr) => value
Endpoint props: (addr) => value
...

Multicost: key => { (cost-type) => value }
Multicost calendar: key => { (cost-type, date-range) => value }
...

Thus, the two parts can be decoupled.  This may simplify the
specifications for extensions on only one aspect.
>
>         Note that I flipped the network map around. I think cidr =>
>         pid is cleaner
>         than pid => cidr-array, and it enforces the rule that a cidr
>         is in only
>         one pid.
>
>
>     Agree. And maybe another benefit from cidr => pid is for
>     incremental update. Once a cidr changes its pid, pid => cidr-array
>     will require an update of the whole cidr-array. But cidr => pid
>     only requires an update of one pid item.
>
>
> Exactly! 
>
>
>         With this approach, the meta section of a response would give
>         the domain
>         for each tuple element, and the value type. For example, here is a
>         conventional cost map:
>
>            meta:
>               tuple-domains: [pid, pid]
>               value-type:
>                  specification: rfc7285
>                  format: cost-type
>                  parameters:
>                      metric: routingcost, mode: numerical
>            map:
>               pid1:
>                  pid1: ###,  pid2: ###, ...
>
>         The value field gives document that specifies this value
>         format (or a
>         registered name), a format name that is defined in that
>         document, and any
>         additional parameters necessary to understand this value.
>
>
>     The specification metadata is very interesting and I like this
>     design. But I don't think RFC is the good solution for the
>     specification citation. RFC is human-readable, but it is hard for
>     programs to parse RFC documents. Maybe we find a schema language
>     (or JSON schema) to specify the value format.
>
>
> Okay, here is where I respectfully disagree. I do not believe a
> program-readable schema will help.
I think an ultimate goal of providing program-readable schema is that
whenever we propose a new type, we only define it with a unified schema
and all clients/servers would be able to understand its format, valid
values and if possible, the semantics.

There was a draft (from Lyle I think) proposing that servers may provide
on-demand metrics based on user queries.  To distribute the new metrics,
we might need such a schema to make sure the servers/clients have the
same understanding of the costs.

However, I don't think the schema should be included as part of the
query.  Instead, it can be put in a remote file and the first time an
ALTO library sees a new schema URL, it should download the file and
parse the schema.
>
> Why not? First, JSON is self-describing. An ALTO client does not need
> a schema to parse JSON. JSON libraries do that on their own. The JSON
> libraries I’ve used build a DOM from the JSON and allow client
> programs to explore that DOM.
>
> IMHO, formal schemas have two uses. One is to allow a program to
> verify that a given JSON tree matches the schema. However, if the
> server’s response does not match the schema, what can a client do
> about it, other than give up on that ALTO server?
>
> The other is to document the layout, and (maybe!) facilitate
> automatically creating a program, or at least a skeleton of a program,
> to program with that data. But no ALTO client is doing to do that
> on-the-fly. That happens off-line, when the programmer decides how the
> client will use an ALTO server.
>
> I believe the important thing is to specify the semantics. E.g., what
> do these values mean? For example, take ordinal vs numerical mode. You
> cannot capture the difference in a schema.
>
> As another example, consider duration values. Frequently durations
> have suffix with the units: 10s, 1.23ms, 15us, etc. Or they have
> colons, as in 1:23.4. A grammar can define the legal values, but
> unless the programmer knows "ms" means milliseconds, the grammar does
> not help.
>
> So I believe values will have to be described by some document, which
> may have a formal schema. The document will define some unique name to
> identify values of this kind, and ALTO clients will switch on that
> name. The name could be rfc####, or it could be yet another IANA
> registry, or whatever.
>
>      
>
>
>         Here is a multi-cost map:
>            meta:
>               tuple-domains: [pid, pid, cost-type-name]
>               value-types:
>                  3:
>
>
>     Is this a typo? Or what's the mean of '3:' here?
>
>
> The 3 means the value-type depends on the value of the third tuple
> element - the cost-type. But now that I look at it again, that is
> overly general. It is probably sufficient to say that either all
> values are the same type, or else the value type depends on the value
> of the last element in the tuple. E.g., cost-type, property-name, etc.
>
>      
>
>                     cost:
>                        specification: rfc7285
>                        format: cost-type
>                        parameters:
>                          metric: routingcost, mode: numerical
>                     delay:
>                        specification: rfc####
>                        format: duration
>                        parameters:
>                           metric: propagation-time
>            map:
>               pid1:
>                  pid2:
>                     cost: 123, delay: 2ms
>
>         Here the different cost types have different value types.
>         meta.value-types
>         means that the value depends on the 3d element of the tuple - the
>         cost-type-name - and gives the value format for each
>         cost-type-name.
>
>
>     It is also more flexible for incremental updates.
>      
>
>
>
>         So are people interested in pursuing this approach? The good
>         news is that
>         will unify everything. The bad news is it will replace
>         everything in rfc
>         7285.
>
>             - Wendy Roome
>
>
>     I find this approach is also compatible with FCS, since FCS
>     returns a fid => value map currently. I'd like to push it forward.
>
>     One more question: I think (src, dst) => cost map and (dst, src)
>     => cost map will have the same "tuple-domains" value, but they are
>     different for clients. How to differentiate them?
>
>
> Because the domains in the tuples are ordered. E.g., in a cost map,
> the first domain is always the source and the second is the destination.
>
>
>     Best,
>     Jensen
>
>
> Thanks! Let's keep the discussion going. Anyone else like to comment?
> Richard??
>
> - Wendy Roome
>
Regards,
Kai

Re: [alto] A unified approach to value schemas an… Gao Kai
Re: [alto] A unified approach to value schemas an… Vijay K. Gurbani
Re: [alto] A unified approach to value schemas an… Wendy Roome
Re: [alto] A unified approach to value schemas an… Vijay K. Gurbani
Re: [alto] A unified approach to value schemas an… Wendy Roome
Re: [alto] A unified approach to value schemas an… Vijay K. Gurbani
Re: [alto] A unified approach to value schemas an… Wendy Roome
Re: [alto] A unified approach to value schemas an… Jensen Zhang
[alto] A unified approach to value schemas and AL… Wendy Roome