Z. Haraszti S. Blake Modular Networks, Inc. Proposal for ForCES Data Encoding Version 1.0 1 Scope ======== This document specifies how data is encoded for the wire inside ForCES messages. Data refers to the content of ForCES LFB attributes and their sub-elements, carried in various CONFIG-REQUEST operations (SET/ADD) and QUERY-RESPONSE messages (e.g., response to a GET operation). 2 Desing objectives and assumptions ==================================== - It is assumed that the type of the data can be inferred by the context in which data is used. Hence, data will not include its type information. The basis for the inference is typically the LFB class id and the path. In other cases it may also include a KEY selector. It is imperative that the CE and FE have identical assumptions on the data types. This requires that they both use the same set of LFB class specifications which is a must anyway. - Not only that the type of the data element node must be known to both parties, but in case the node is a complex data type, all its sub-elements' type must also be known to both parties. - Data encoding should be recursive, i.e., it should support encoding of nested data at an arbitrary level. It should not only be possible to encode atomic data types, but also full structs, entire or partial arrays including arrays of structs, structs containing structs and arrays, arrays of arrays, etc., as dictated by the type of the target element. - The encoding and decoding algorithmn should be as simple as possible. Specifically both should be a standard sequential deep-first operation with no look-ahead ever required. - Random sub-element access will not be guaranteed. Specifically, random sub-element access will not be provided when the sub-elements inlcude one or more sub-elements with variable-size encoding (there are ARRAYs and STRING[N]'s. In such cases the receiving side will need to sequentially decode the received binary data block to locate the place and value of a given sub-element. - Decoding and encoding should be friendly to C/C++ implementations, i.e., it should require little or no re-encoding of the data (other than changing endianness). In particular, it should be possible to design C/C++ data structures in the CE/FE code such that their content can be simply block-copied when being transfered via ForCES. This is critical to not to compromize protocol processing speed. - The encoding should be byte-efficient (as compact as sensible). This is to not to compromize bandwidth. - The length of the encoded data block will always be a multiple of 32-bit words. 3 Data Types ============= As a refresher, below we summarize the data types allowed in ForCES (see the Model document for more explanation). Built-in atomic types: - Integers (INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64) - Floats (FLOAT32, FLOAT64) - String (STRING[N], null-terminated string of max length N) - Simple byte aray (BYTES[N], simple byte array of fixed size N with no indexing capabilities) Compound type constructs: ARRAY (also called table): array of identical elements, each associated with a sticky/inmutable index, but index is not part of the entry. STRUCT: ordered sequence of fields, each can be of any data type, including atomic and compound types. Each field is associated with an index, derived from its position inside the STRUCT declaration. [For now, we do not cover UNION and AUGMENTATION types.] Derived types: ForCES allows the derivation of new atomic types from the existing (built-in) atomic types to allow specialization, i.e., via renaming, range restricting, and/or defining keywords for special values. New complex types can be defined using the compound type constructs (ARRAY and STRUCT), building complex data structures from atomic data types or from other complex types. 4 Data Encding =============== The encoding process is defined as DATA = encode(path) where path specifies the data element, element[path], to be encoded. Path specifies the element inside an LFB. Path is a sequence (list) of uint32 values. The first element of path (path[0]) identifies the attribute within the LFB. If the attribute is of ARRAY or STRUCT type, the second element of path (path[1]) specifies the entry of the ARRAY or the field in the STRUCT, respectively. As long as path[k] is a complex type, path[k+1] refers to the respective sub-element inside element[path[k]]. Note that the target element of the operation (element[path]), may itself be an atomic type or a complex type. If it is an atomic type, the atomic type will be encoded. If it is a complex type, than the entire element including all its nested sub-elements will be encoded. Let type = type_of(element[path]) denote the type identifier of the target element. DATA is built as follows: DATA := PACK(type, value) + PADDING where type = type_of(element[path]) value = element[path] PACK(type, value) denotes the result of the binary encoding of value according to the specified type. The encoding is type-dependent; see respective specification for each type below. PADDING := 1, 2 or 3 bytes of 0x00 values to round up the resulting data block to the next 32-bit boundary. 4.1 Atomic type encodings -------------------------- 4.1.1 Encoding of Integer Types If type is one of INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, its encoding is as follows: PACK(type, integer) := 1/2/4/8 bytes of binary data in network byte order (big endian). The number of bytes reflects the length of the type. 0 7 +-+-+-+-+-+-+-+-+ | INT8 / UINT8 | +-+-+-+-+-+-+-+-+ 1 0 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | INT16 / UINT16 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | INT32 / UINT32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- INT64 / UINT64 -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.1.2 Encoding of Float Types If type is FLOAT32 or FLOAT64, its encoding is as follows: PACK(type, float) := 4/8 bytes of binary data in network byte order, using IEEE floating point numbers encoding. 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FLOAT32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- FLOAT64 -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.1.3 Encoding of STRING[N] Types If type is STRING[N], its encoding is as follows: PACK(STRING[N], string) := LEN + STRING + PADDING where: LEN := 2 byte string length indicator encoded as uint16 (big endian). Length includes only the consecutive non-zero characters from the start of the string. STRING := Non-zero characters of the string. PADDING := see above. 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ | | +- -+ ... STRING + 1/2/3 bytes PADDING ... +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Examples: string = "abcde", N = 16: 00 05 61 62 63 64 65 00 string = "abcde", N = 128: same as above, encoding does not care about max. size of string. string = "" 00 00 00 00 string = "abcdef" 00 06 61 62 63 64 65 66 string = "abcdefg" 00 07 61 62 63 64 65 66 67 00 00 00 4.1.4 Encoding of BYTES[N] Types If type is BYTES[N], its encoding is as follows: PACK(BYTES[N], byte_array) := BYTE_ARRAY + PADDING where: BYTE_ARRAY := is all the N bytes of the byte_array in the same order as stored PADDING := see above 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ ... N bytes of BYTE_ARRAY + 1/2/3 bytes PADDING ... +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note: Note that the four sub-32-bit integer types (INT8, UINT8, INT16 and UINT16) preserve they sizes in PACK(). All other atomic types are packed to integer multiples of 4-byte words. I.e., even though a BYTE[6] is only 6 bytes long, PACK(BYTE[6], value) will be 8 bytes long. 4.2 Encodind of Compound Types 4.2.1 Struct Encoding Struct encoding is rather straight-forward. For struct with fixed-size content the encoding follows the ANSI C struct representation standards. If type is STRUCT (more precisely a derived type using the STRUCT construct), its content is encoded using the following rules: 1 The encoded STRUCT will always contain the entire content of the STRUCT. This includes all its fields, and all sub-elements of its fields. 2 Each sub-element (field) is pre-packed with its respective PACK operation. 3 PACK-ed sub-elements are placed in the resulting data block strictly in the same order in which the fields are specified in the definition of the STUCT data type. 4 If a field is one of the 8-bit or 16-bit integer types, it is placed in the resulting data block aligned on its natural length. For example, if the STRUCT includes a UINT16 field, the 2 bytes of a PACK(UINT16, int) will start on an even byte offset inside PACK(STRUCT, struct element). 5 PACK-ed fields of 32-bit and larger sub-elements will be 32-bit aligned. 6 If alignment requires padding, 1, 2 or 3 all-zero bytes will be appended to the already encoded block until the required alignment is achieved. 7 More than one sub-32-bit fields may be packed into the same 32-bit word in way that rules 3, 4, 5 and 6 are all satisfied. Example: STRUCT { UINT8 A; INT16 B; INT32 C; INT8 D; BYTES[6] E; INT8 F; INT8 G; INT16 H; STRING[32] I = "abcdef" ARRAY { ... } J; } 3 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | A |X X X X X X X X| B | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | C | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D |X X X X X X X X X X X X X X X X X X X X X X X X| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | E[0] E[1] E[2] E[3] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | E[4] E[5] |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | F | G | H | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- I -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | ... J ... | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2.2 Encoding of Array Types This is the only non-trivial type in terms of encoding. Before presenting the encoding format and rules, let's state some requirements: - Number of elements must be specified in a header - If the type of the entries are not of fixed size, the encoded entries will have variable sizes, so the size of each entry must be provided. - The index of each entry may or may not need to be provided with the data. Both cases must be supported. - Optional field specification ... [TBF]