[Forces-protocol] Re: Data encoding -- first part

Alan,

Thanks for your quick but rather exhaustive comments.
See some feedback below.

On Mon, 2004-10-18 at 13:21, Alan DeKok wrote:
> Zsolt Haraszti wrote:
> 
> > - It is assumed that the type of the data can be inferred by the
> >   context in which data is used.  Hence, data will not include its
> >   type information. 
> 
>    It may be useful to also be able to explicitly describe a type, if 
> only for initialization purposes.  This ensures that the FE can 
> communicate implementation-specific extensions to the CE, and that the 
> CE can understand them.
> 
>    e.g. "ACL is array[n] of struct {ip, ip, proto, port, port}"
> 
>    These types SHOULD NOT be used outside of initialization, as they 
> will make the protocol complex, and the implementation problematic.
> 

I agree that this could be useful.  We shall find a way to
accommodate this in the protocol somewhere.

> > - Random sub-element access will not be guaranteed.  Specifically, random
> >   sub-element access will not be provided when the sub-elements inlcude
> >   one or more sub-elements with variable-size encoding (there are ARRAYs
> >   and STRING[N]'s.
> 
>    I would phrase that as "random sub-element access is provided only 
> when all sub-elements are of the same, fixed, size".  It means the same, 
> but phrasing it in a positive way means that you're defining what's OK. 
>   By definition, then, everything else is not OK, and you don't have to 
> enumerate the infinite weird ways that can't be used for access.
> 

OK

> > - The length of the encoded data block will always be a multiple of 32-bit
> >   words.
> 
>    <nods violently>
> 
> ...
> > 	PADDING := 1, 2 or 3 bytes of 0x00 values to round up the resulting
> > 		   data block to the next 32-bit boundary.
> 
>    I would say "0, 1, 2, or 3 bytes..."
> 
>    You've defined "DATA := PACK(type, value) + PADDING", so there should 
> be some provision for PADDING to be zero.
> 

Right.

> 
> > PACK(type, integer) :=	1/2/4/8 bytes of binary data in network
> > 		 	byte order (big endian).
> 
>    I would say "network byte order", rather than "big endian".  It 
> avoids terminology from "religious wars".
> 

OK, we can delete the "(big endian)"

> > PACK(type, float) :=	4/8 bytes of binary data in network byte
> > 			order, using IEEE floating point numbers
> > 			encoding.
> 
>    I think the reference is "ANSI/IEEE Standard 754-1985"
> 

I will check (or let me know when you know for sure :).

> 
> > 4.1.3 Encoding of STRING[N] Types
> > 
> > If type is STRING[N], its encoding is as follows:
> > 
> > PACK(STRING[N], string) := LEN + STRING + PADDING
> 
>    Where "+" is "concatenation".
> 

That's right.

> > where:
> >    
> >    	LEN := 2 byte string length indicator encoded as uint16
> > 	       (big endian).  Length includes only the consecutive
> > 	       non-zero characters from the start of the string.
> 
>    Can strings contain embedded NULs?
> 

My suggestion would be to follow C string conventions.
So the first NUL and anything after it is ignored.

And naturally, only the initial non-NUL content is carried
over the wire in ForCES messages.

> 
> > Examples:
> > 
> >    	string = "abcde", N = 16:   00 05 61 62
> > 				    63 64 65 00
> > 
> > 	string = "abcde", N = 128:  same as above, encoding does not care
> > 				    about max. size of string.
> 
>    I'm not sure I see why it is necessary to allow NUL terminated 
> strings which have many, many, NULs at the end.  Or, why is "N=128" here?
> 

There may some misunderstanding here, so let me rewind a bit.

N in STRING[N] defines that max size of the string _in the model_.
This is analogous to the

	char 	if_name[16]

C construct (as opposed to the char *if_name).

What the above two examples illustrate is that max size N does
not really matter when sending the content over the wire (the
content still must be smaller then N), because the string
encoding will only transfer the useful part of the string,
i.e., everything till the terminating '\0'.

If people have problem with this we can define alternative string
model(s), but I thought this is sufficient.

> (byte arrays)
> > Note that the four sub-32-bit integer types (INT8, UINT8, INT16 and UINT16)
> > preserve they sizes in PACK().  All other atomic types are packed
>             ^^^^
>    "their" (and other miscellaneous typos)
> 
> 
> 
> > 4.2.1  Struct Encoding
> > 
> > Struct encoding is rather straight-forward.  For struct with fixed-size
> > content the encoding follows the ANSI C struct representation
> > standards.
> 
>    What does that mean?
> 

I hoped this should be clear from the rest of the text, but apparently
not.

Let me try to clarify it a bit:  Suppose you define a STRUCT A data type
per ForCES data model.  Suppose furthermore that you implement that
struct in the FE/CE software by means of a C struct: struct A.  The C
struct will have a certain memory layout in the host memory.  That 
layout is what I referred to as "ANSI C struct representation" (I may
have used the wrong reference here; I shall look it up to be sure).

The statement above is that for structs which do not have variable
size fields, the ForCES encoded STRUCT will have the same layout as
the C struct in memory (minus endianness).  The obvious advantage of
this is that you can memcpy() between the message buffer and you
C struct.

> > If type is STRUCT (more precisely a derived type using the STRUCT construct),
> > its content is encoded using the following rules:
> ...
> > 2  Each sub-element (field) is pre-packed with its respective PACK
> >    operation.
> 
>    What does that mean?
> 
>    I think you mean to say something like "the packing of the structure 
> is the simple concatenation of the packing of it's elements".
> 
>    e.g.
> 
>    PACK(struct(foo)) = PACK(foo.element1) + PACK(foo.element2) ...

Almost, but not quite.  Since we want to make this layout compatible
with C implementations, we need to include padding between some of the
elements.  The stated alignment rules serve this purpose.

The formula would be more precise if stated like this:

PACK(STRUCT, foo) = PACK(..., foo.field1) + PADDING1 +
                    PACK(..., foo.field2) + PADDING2 +
                    ...

where PADDINGn is k = 0, 1, 2, or 3 bytes of bytes, where k is
chosen to make the next sub-element be positioned at its required
alignment.

I believe the provided example illustrates the rules well.

> 
>    If that's true, then it will be possible to write a function which 
> takes a ForCES STRUCT definition and a pointer to a C structure 
> containing the relevant data, and return a PACKed DATA field.  While 
> this process often won't be necessary, it may be useful to give a sample 
> function in pseudo-code, or C, to do this packing.  That way people 
> implementing the protocol have a "known good" place to start from.

Are you suggesting putting a pseudo-code into the description
here or into the final standard (or both)?

> 
> > 4  If a field is one of the 8-bit or 16-bit integer types, it is placed in the
> >    resulting data block aligned on its natural length.  For example,
> >    if the STRUCT includes a UINT16 field, the 2 bytes of a PACK(UINT16, int)
> >    will start on an even byte offset inside PACK(STRUCT, struct element).
> 
>    Systems which have difficulties addressing byte-aligned data may not 
> like this style of packing.  Aligning everything on 32-bit boundaries 
> may not be efficient, though.

One extreme would be to densely pack, not caring with alignments
at all.  The other would be to align everything on 32-bit boundaries,
and align the 64-bit float and int types on 64-bit.

A sweet spot is to align 32-bit or larger items on 32-bit, and
smaller items on their own size.  This follows the alignment rules of
C structs, which should be a huge merit.

> 
> > 5  PACK-ed fields of 32-bit and larger sub-elements will be 32-bit
> >    aligned.
> > 
> > 6  If alignment requires padding, 1, 2 or 3 all-zero bytes will be appended
> >    to the already encoded block until the required alignment is achieved.
> 
>    I would say "zero, one, two, or three bytes of zero will be appended 
> to the encoded block, to align it to a 32-bit boundary".
> 

Again, not always 32-bit.  A uint16 or int16 will be aligned to the
next 16-bit boundary.

> > Example:
> > 
> >    STRUCT {
> >    	UINT8		A;
> > 	INT16		B;
> 
>    It would be great to have some sample code which read in simple 
> STRUCT definitions, and produced the packed DATA.  Having a function to 
> produce ASCII art would be even better...

You mean "STRUCT definition and actual value, and produced the packed
DATA"?                      ^^^^^^^^^^^^^^^^

We can have a contest on who could post that program first.
Input: a) an the XML data specification per current ForCES model
          document
       b) value assignment for the fields
Output: an ASCII art showing the packet data.

> > 4.2.2  Encoding of Array Types
> > 
> > This is the only non-trivial type in terms of encoding.  Before presenting
> > the encoding format and rules, let's state some requirements:
> > 
> > - Number of elements must be specified in a header
> > - If the type of the entries are not of fixed size, the encoded entries
> >   will have variable sizes, so the size of each entry must be provided.
> 
>    Variable-sized entries are severely problematic.
> 
I agree that they are more trouble-some.  But I find it too restrictive
to prohibit them.  Arrays of arrays will have this.  Or, if your array
is based on a struct that has string field(s), will have this problem.

This is one of those places where I would think good designs will avoid
such ARRAYs as much as possible, but in some cases it cannot be avoided
so we must support it in the protocol.

/zsolt

>    Is there a requirement to have arrays of variable sized elements?  To 
> me, that sounds more like a big STRUCT.
> 
>    Alan DeKok.
-- 
Zsolt Haraszti                Phone:  +1 919-765-0027/2017
Modular Networks              Mobile:      +1 919-522-2337
                              Email:  zsolt@modularnet.com

_______________________________________________
Forces-protocol mailing list
Forces-protocol@ietf.org
https://www1.ietf.org/mailman/listinfo/forces-protocol