Z. Haraszti
S. Blake
Modular Networks, Inc.


                  Proposal for ForCES Data Encoding                   
	    
                             Version 1.0                              


1  Scope
========

This document specifies how data is encoded for the wire inside
ForCES messages.  Data refers to the content of ForCES LFB attributes
and their sub-elements, carried in various CONFIG-REQUEST operations
(SET/ADD) and QUERY-RESPONSE messages (e.g., response to a GET
operation).


2  Desing objectives and assumptions
====================================

- It is assumed that the type of the data can be inferred by the
  context in which data is used.  Hence, data will not include its
  type information.  The basis for the inference is typically the LFB
  class id and the path.  In other cases it may also include a KEY
  selector.  It is imperative that the CE and FE have identical
  assumptions on the data types.  This requires that they both
  use the same set of LFB class specifications which is a must anyway.
  
- Not only that the type of the data element node must be known to both
  parties, but in case the node is a complex data type, all its
  sub-elements' type must also be known to both parties.
  
- Data encoding should be recursive, i.e., it should support encoding
  of nested data at an arbitrary level.  It should not only be possible to
  encode atomic data types, but also full structs, entire or partial
  arrays including arrays of structs, structs containing structs and arrays,
  arrays of arrays, etc., as dictated by the type of the target element.

- The encoding and decoding algorithmn should be as simple as possible.
  Specifically both should be a standard sequential deep-first operation
  with no look-ahead ever required.

- Random sub-element access will not be guaranteed.  Specifically, random
  sub-element access will not be provided when the sub-elements inlcude
  one or more sub-elements with variable-size encoding (there are ARRAYs
  and STRING[N]'s.  In such cases the receiving side will need to
  sequentially decode the received binary data block to locate the
  place and value of a given sub-element.

- Decoding and encoding should be friendly to C/C++ implementations,
  i.e., it should require little or no re-encoding of the data (other than
  changing endianness).  In particular, it should be possible to design
  C/C++ data structures in the CE/FE code such that their content can be
  simply block-copied when being transfered via ForCES.  This is critical
  to not to compromize protocol processing speed.

- The encoding should be byte-efficient (as compact as sensible).  This
  is to not to compromize bandwidth. 

- The length of the encoded data block will always be a multiple of 32-bit
  words.


3  Data Types
=============

As a refresher, below we summarize the data types allowed in ForCES
(see the Model document for more explanation).

Built-in atomic types:
	
- Integers (INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64)
- Floats (FLOAT32, FLOAT64)
- String (STRING[N], null-terminated string of max length N)
- Simple byte aray (BYTES[N], simple byte array of fixed size N with
  no indexing capabilities)

Compound type constructs:

ARRAY (also called table): array of identical elements, each
associated with a sticky/inmutable index, but index is not part
of the entry.
   
STRUCT: ordered sequence of fields, each can be of any data type,
including atomic and compound types.  Each field is associated
with an index, derived from its position inside the STRUCT
declaration.

[For now, we do not cover UNION and AUGMENTATION types.]

Derived types:

ForCES allows the derivation of new atomic types from the existing
(built-in) atomic types to allow specialization, i.e., via renaming,
range restricting, and/or defining keywords for special values.

New complex types can be defined using the compound type constructs
(ARRAY and STRUCT), building complex data structures from atomic data
types or from other complex types.


4  Data Encding
===============

The encoding process is defined as

	DATA = encode(path)

where path specifies the data element, element[path], to be encoded.

Path specifies the element inside an LFB.  Path is a sequence (list)
of uint32 values.  The first element of path (path[0]) identifies the
attribute within the LFB.  If the attribute is of ARRAY or STRUCT type,
the second element of path (path[1]) specifies the entry of the ARRAY
or the field in the STRUCT, respectively.  As long as path[k] is a
complex type, path[k+1] refers to the respective sub-element inside
element[path[k]].

Note that the target element of the operation (element[path]), may itself
be an atomic type or a complex type.  If it is an atomic type, the
atomic type will be encoded.  If it is a complex type, than the entire
element including all its nested sub-elements will be encoded.

Let type = type_of(element[path]) denote the type identifier of the
target element.

DATA is built as follows:

	DATA := PACK(type, value) + PADDING 

where

	type = type_of(element[path])
	
	value = element[path]
	
	PACK(type, value) denotes the result of the binary encoding of value
			according to the specified type.  The encoding is
			type-dependent; see respective specification
			for each type below.
			
	PADDING := 1, 2 or 3 bytes of 0x00 values to round up the resulting
		   data block to the next 32-bit boundary.


4.1  Atomic type encodings
--------------------------


4.1.1 Encoding of Integer Types

If type is one of INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64,
its encoding is as follows:

PACK(type, integer) :=	1/2/4/8 bytes of binary data in network
		 	byte order (big endian).  The number of bytes
			reflects the length of the type.

     0             7
    +-+-+-+-+-+-+-+-+
    | INT8 / UINT8  |
    +-+-+-+-+-+-+-+-+

                                   1
     0                             5
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         INT16 / UINT16        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        INT32 / UINT32                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +-                       INT64 / UINT64                        -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


4.1.2 Encoding of Float Types

If type is FLOAT32 or FLOAT64, its encoding is as follows:

PACK(type, float) :=	4/8 bytes of binary data in network byte
			order, using IEEE floating point numbers
			encoding.

                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            FLOAT32                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +-                           FLOAT64                           -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


4.1.3 Encoding of STRING[N] Types

If type is STRING[N], its encoding is as follows:

PACK(STRING[N], string) := LEN + STRING + PADDING
	
where:
   
   	LEN := 2 byte string length indicator encoded as uint16
	       (big endian).  Length includes only the consecutive
	       non-zero characters from the start of the string.

	STRING := Non-zero characters of the string.

	PADDING := see above.

                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |              LEN                |                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                            -+
    |                                                               |
    +-                                                             -+
   ...                   STRING + 1/2/3 bytes PADDING              ...
    +-                                                             -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

	
Examples:

   	string = "abcde", N = 16:   00 05 61 62
				    63 64 65 00

	string = "abcde", N = 128:  same as above, encoding does not care
				    about max. size of string.

	string = ""		    00 00 00 00
	
	string = "abcdef"	    00 06 61 62
				    63 64 65 66
	
	string = "abcdefg"	    00 07 61 62
				    63 64 65 66
				    67 00 00 00


4.1.4 Encoding of BYTES[N] Types

If type is BYTES[N], its encoding is as follows:

PACK(BYTES[N], byte_array) := BYTE_ARRAY + PADDING
	
where:
   	
	BYTE_ARRAY := is all the N bytes of the byte_array in the same order
		      as stored
		      
	PADDING := see above
	
                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +-                                                             -+
   ...         N bytes of BYTE_ARRAY + 1/2/3 bytes PADDING         ...
    +-                                                             -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Note:

Note that the four sub-32-bit integer types (INT8, UINT8, INT16 and UINT16)
preserve they sizes in PACK().  All other atomic types are packed
to integer multiples of 4-byte words.  I.e., even though a BYTE[6] is
only 6 bytes long, PACK(BYTE[6], value) will be 8 bytes long.


4.2  Encodind of Compound Types

4.2.1  Struct Encoding

Struct encoding is rather straight-forward.  For struct with fixed-size
content the encoding follows the ANSI C struct representation
standards.

If type is STRUCT (more precisely a derived type using the STRUCT construct),
its content is encoded using the following rules:

1  The encoded STRUCT will always contain the entire content of the STRUCT.
   This includes all its fields, and all sub-elements of its fields.

2  Each sub-element (field) is pre-packed with its respective PACK
   operation.

3  PACK-ed sub-elements are placed in the resulting data block strictly
   in the same order in which the fields are specified in the definition
   of the STUCT data type.

4  If a field is one of the 8-bit or 16-bit integer types, it is placed in the
   resulting data block aligned on its natural length.  For example,
   if the STRUCT includes a UINT16 field, the 2 bytes of a PACK(UINT16, int)
   will start on an even byte offset inside PACK(STRUCT, struct element).
   
5  PACK-ed fields of 32-bit and larger sub-elements will be 32-bit
   aligned.

6  If alignment requires padding, 1, 2 or 3 all-zero bytes will be appended
   to the already encoded block until the required alignment is achieved.

7  More than one sub-32-bit fields may be packed into the same 32-bit
   word in way that rules 3, 4, 5 and 6 are all satisfied.

Example:

   STRUCT {
   	UINT8		A;
	INT16		B;
	INT32		C;
	INT8		D;
	BYTES[6]	E;
	INT8		F;
	INT8		G;
	INT16		H;
	STRING[32]	I = "abcdef"
	ARRAY {
		...
		}	J;
   }
   
                                                                   3
     0                                                             1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       A       |X X X X X X X X|               B               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               C                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       D       |X X X X X X X X X X X X X X X X X X X X X X X X|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      E[0]            E[1]            E[2]          E[3]       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      E[4]            E[5]     |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       F       |       G       |               H               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +-                              I                              -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +-                                                             -+
    |                                                               |
   ...                              J                              ...
    |                                                               |
    +-                                                             -+
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



4.2.2  Encoding of Array Types

This is the only non-trivial type in terms of encoding.  Before presenting
the encoding format and rules, let's state some requirements:

- Number of elements must be specified in a header
- If the type of the entries are not of fixed size, the encoded entries
  will have variable sizes, so the size of each entry must be provided.
- The index of each entry may or may not need to be provided with the data.
  Both cases must be supported.
- Optional field specification ...

[TBF]