[P2PSIP] Review of RELOAD base draft

Jouni Mäenpää <jouni.maenpaa@ericsson.com> Fri, 27 November 2009 10:49 UTC

Message-ID: <4B0FAEAC.40104@ericsson.com>
Date: Fri, 27 Nov 2009 12:49:16 +0200
From: Jouni Mäenpää <jouni.maenpaa@ericsson.com>
User-Agent: Thunderbird 2.0.0.17 (X11/20080925)
MIME-Version: 1.0
To: "David A. Bryan" <dbryan@ethernot.org>, Brian Rosen <br@brianrosen.net>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
Cc: "p2psip@ietf.org" <p2psip@ietf.org>
Subject: [P2PSIP] Review of RELOAD base draft
Precedence: list

Hi,

I volunteered to review the RELOAD base draft in the P2PSIP session in
IETF 76. My collegue Ari Keranen also volunteered after the IETF
meeting. Since it's a long draft, we decided to do the review together.
Our comments are listed below.

Cheers,
Jouni Maenpaa
Ari Keranen

Page 13, Section “1.2.1. Usage Layer”
The text talks about the abstract Message Transport API. Could the
functionality that the API provides be described somewhere in more detail?

Page 17, Section “2. Terminology”
(1) There are some terms marked with a (*) that are new and “perhaps
should be added to the concepts document”. I suppose a decision needs to
be made whether to have the terms permanently in the base draft or
whether to move them to the concepts draft.
(2) Should the terms Kind and Kind-ID be defined here?

Page 22, Section “3.2.1. Client Routing”
(1) “Note that clients that choose this option MUST process Update
messages from the peer. Those updates can indicate that the peer no
longer owns the Client's Node-ID.”
Should the content of these Update requests be defined somewhere in the
draft or are the regular Updates enough for doing this?
(2) If I have understood correctly, a peer does not know whether a
particular node in its connection table is a peer or a client.
Therefore, it has to send stabilization messages (i.e., Updates) to all
nodes in its connection table. The cost of this can be very high in
overlays experiencing churn. I have more comments on this topic later
(in the comments I have for section 9). If a peer could know which nodes
in the connection table are clients and which are peers, Updates could
be sent only to those nodes (i.e., clients) that need to be informed
immediately (peers are informed through the periodic stabilization
process so there is no need to send a reactive Update).
(3) The description of the second routing option by which a client may
be located in an overlay contains the following text: “The Destination
List required to reach it [the client] must be learnable via other
mechanisms, such as being stored in the overlay by a usage...”
Since this alternative is given in the base draft, should the base draft
also describe the usage for storing Destination Lists? Without such a
usage, the second routing option is not really usable.
(4) Should the overlay configuration document contain information about
which client routing mode is used in the overlay?

Page 22, Section “3.2.2. Minimum Functionality Requirements for Clients”
“Implement RELOAD's connection-management connections”
What are connection-management connections?

Page 23, Section “3.3. Routing”
“Client promotion: RELOAD must support clients that become peers at a
later point as determined by the overlay algorithm and deployment.”
Should client promotion be described somewhere in the base draft?

Page 27, step (4) of the second operation required to join the overlay
“AP does Updates to JP and to other peers...”
Doing this is probably ok in a small and low-churn network (i.e., in
cases when it is ok to use reactive stablization). However, in a larger,
higher-churn overlay (in which periodic stabilization would be used
instead of reactive stabilization), sending Updates to all peers in the
AP's connection (or routing/neighbor) table might be too expensive. Such
Updates may not be needed since the other peers will learn about the JP
through the periodic stablization process.

Page 32, Section “4.3. Application Connectivity”
“Each usage specifies which types of connections can be initiated using
AppAttach.”
Should this requirement be listed also in Section “4.1.2. Usages”?

Page 36, third paragraph
(1) “When an originating node transmits a requests it MUST set a 3
second timer”
Should the value of the timer be configurable?
(2) Should there be a back-off mechanism for the retransmissions?
(3) “If a response has not been received when the timer fires, the
request is retransmitted...”
How is this related to the procedures in Section “5.6.2. Reliability for
Unreliable Links”? Is it intentional that both the originator of the
request and the intermediate peers retransmit the messages?
(4) “The request MAY be retransmitted up to 4 times...”
Should the maximum number of retransmissions be configurable? When
should one use less than four retransmissions?
(5) The P2PP protocol had a pretty nice way of describing the
transaction state machines, including the various timers and
retransmissions (see
http://tools.ietf.org/id/draft-baset-p2psip-p2pp-01.txt, figures 8-11).
I would like to see similar diagrams in the RELOAD base draft as well.
This would be very helpful when implementing RELOAD. P2PP also specified
different state machines for reliable and unreliable transports. This
could be done in the RELOAD base draft as well.

Page 36, fourth paragraph
“the maximum request lifetime (15s)”
Should the maximum request lifetime be configurable?

Page 38, first paragraph
There is an open question: “We consider it an open question what the
final protocol definition method and encodings use.”
I suppose this open question can be removed.

Page 38
(1) “Note:  the use of "typedef" here is an extension to the TLS
language, but its meaning should be relatively obvious”
Perhaps it would be a good idea to explain the meaning anyway.
(2) Is an opaque string of bytes always preceded by one or more length
bytes like in case of ResourceID? Could you clarify this in the draft
(it is currently not very clear)?

Page 41, “version” field in the forwarding header
Just wanted to verify that it is intentional to still use version 0.1.

Page 42, Section “5.3.2.1. Processing Configuration Sequence Numbers”
(1) Is the Config_Update request sent on a direct connection or across
the overlay? If it is sent across the overlay via multiple hops, then it
can cause additional load at intermediate peers. The Config_Update
request may also be considerably larger than other requests sent in the
overlay, because the configuration document can be rather large.. I
would like to see some additional discussion on the traffic load the
proposed mechanism can cause. If the configuration documents propagate
very quickly throughout the system, then there might be a considerable
short-term traffic peak. Perhaps there is a need for a mechanism that
can limit the additional traffic caused by Config_Update requests.
(2) As an optimization, intermediate peers could potentially inspect the
configuration documents in the Config_Update messages they forward and
update their own configuration settings if they see a newer document.
However, I am not sure whether this would reduce the load enough.

Page 46
Where is the format of the MessageCode defined?

Page 48, definition of error_info
What is the format of the text string? UTF-8?

Page 52, Section “5.4.1. Topology Plugin Requirements”
(1) “All overlay algorithms MUST specify maintenance procedures that
send Updates to all members of the Connection Table whenever the range
of Ids for which the peer is responsible changes”
Doesn't this mean that all overlay algorithms are mandated to use
reactive stablization? I would rephrase this to something like “All
overlay algorithms MUST specify maintenance procedures.”, that is, not
mandate reactive Updates.
(2) Further, sending to all members of the Connection Table might be an
overkill. Wouldn't it be enough to send only to the peers in the
Neighbor set (provided that the overlay algorithm that is used maintains
such a set)?

Page 54, Section “5.4.2.3. Update”
(1) “In general, peers MUST send Update messages to all their
adjacencies whenever they detect a topology shift”
Again, the draft is mandating the use of reactive stablization. So I
guess this sentence should be removed. Alternatively, the MUST could be
changed to MAY.
(2) “The UpdateAns response is expected to be either success or an error”
I think that in some cases, it might be beneficial to have some content
in the UpdateAns response (e.g., routing table entries). Therefore, I
think the draft should make it possible for UpdateAns responses to
contain overlay-specific data (in the same way it allows this for
UpdateReq requests).

Page 59, AttachReqAns struct
Is the “role” encoded as a string? Wouldn't an enum be sufficient?

Page 60, definition of “rel_addr_port”
“corresponds to the rel-addr and rel-port productions. Only present for
type "relay".”
Should this be present also for srflx and prflx types as in the struct
on the previous page?

Page 61, Section “5.5.1.3.  Using ICE With RELOAD”
“Because RELOAD always tries to use TCP and then UDP as a fallback [...]”
Is this normative? Can't a peer prefer UDP?

Page 70, Section “5.5.6.2. Response Definition”
“If the Config_UpdateReq is of type “kind” it MUST only be processed if
it is correctly digitally signed by an acceptable kind signer.”
Could you define what an “acceptable kind signer” is here? Perhaps I
missed something, but this was unclear to me.

Page 73, Section “5.6.2.2. Retransmission and Flow Control”
This section mentions multiple alternatives for the retransmission and
congestion control scheme. Should all the peers in the overlay use the
same retransmission and congestion control scheme? In that case, the
scheme to use should probably be specified in the overlay configuration
document.

Page 82, text about replica numbers
“If the replica number is zero, then the peer MUST check that it is
responsible for the resource and, if not, reject the request.  If the
replica number is nonzero, then the peer MUST check that it expects to
be a replica for the resource and that the request sender is consistent
with being the responsible node (i.e., that the receiving peer does not
know of a better node) and, if not, reject the request.”
This text may assume too much about the replication strategy that is
being used. As far as I know, in some replication strategies (e.g.,
multi publication key replication), a peer storing a replica may not be
able to check whether it is supposed to store a replica (in the
successor replication strategy that RELOAD uses, this can be done by
checking the predecessor list). Therefore, I think this text should be
made more generic so that it allows the use of other replication strategies.

Page 96, Section “8. TURN Server Usage”
Perhaps there could be some additional text in this section stating that
the overlay MAY use an alternative, more advanced service discovery
mechanism for locating TURN servers if such a mechanism is available.

Pages 97-98, Section “9. Chord Algorithm”
(1) One additional major difference to the original Chord algorithm that
should be mentioned here is that chord-reload can optionally use
reactive stabilization. The original Chord algorithm does not use
reactive stablization but only periodic stabilization. One could also
explain why reactive stabilization is allowed in chord-reload (i.e.,
that it has been shown to work well in small, low-churn overlays). One
should also add some information about previous results, which have
shown that reactive recovery can result in degraded performance or even
congestion collapse in large-scale (1000 peers or more) overlays
experiencing moderate or high levels of churn. Results proving that
reactive recovery is not optimal in such networks are available in [1].
(2) It should also be mentioned in here that reactive stabilization MAY
be disabled in chord-reload (the “if using reactive recovery” statements
in the text actually allow this). In this case, only periodic
stabilization is used. Use of periodic stabilization is more appropriate
in large-scale (1000 peers or more) overlays experiencing churn.

Page 98, Section “9.1. Overview”
(1) “The neighbor table typically contains the three peers before this
peer and the three peers after it in the DHT ring”
The draft should state here that the successor list and the predecessor
list can contain more entries if necessary. The Chord paper [2] states
that O(log2(N)) successors should be maintained to increase robustness.
Maintaining three successors is sufficient in very small overlays.
However, in a Chord ring with for instance 1000 peers, on the order of
10 successors should be maintained to make the system more robust.
Perhaps the number of predecessors and successors to maintain should be
made configurable?
(2) “There may not be three entreis...” => “There may not be three
*entries*...”

Page 99, Section “9.3. Redundancy”
(1) “When a peer receives a Store request for Resource-ID k, and it is
responsible for Resource-ID k, it stores the data and returns a success
response. It then sends a Store request to its successor in the neighbor
table and to that peer's successor.”
I would expect that in large-scale overlays experiencing churn, it may
be necessary to store more than 2 replica copies. Perhaps the number of
replicas to store should be configurable (of course the size of the
neighbor table limits the number of replicas that can be stored).
(2) This section is missing details on when replicas can be removed and
when the replicating peer needs to store further replicas. Those details
are specified in the end of Section “9.6.3. Receiving Updates”. Could
those details be moved to Section 9.3?

Page 99, Section “9.4. Joining”
(1) “3. JP SHOULD send Attach requests to initiate connections to each
of the peers in the neighbor table as well as to the desired 16 finger
table entries.”
Should it be mentioned here that also finger tables having more than 16
entries are allowed? Perhaps the number of fingers to maintain should be
made configurable.
(2) “8. AP MUST send an Update to all its neighbors with the new values
of its neighbor set (including JP).”
Informing all the neighbors of the AP may be unnecessary if periodic
recovery is being used. Could this sentence be modified as follows: “8.
If using reactive recovery, the AP MUST send an Update...” If periodic
recovery is used, then peridic stabilization will take care of informing
the neighbors of the AP that a new peer has joined.
(3) “In order to set up its neighbor table entry for peer i...”
I guess this should be “In order to set up its finger table entry for
peer i...”

Page 102, Section “9.6.1. Handling Neighbor Failures”
(1) This section and many other subsections contain statements like this:
“If using reactive recovery...”
I think it should be stated explicitly somewhere (in the beginning of
Section 9, as I am suggesting above) that reactive recovery can be
disabled, in which case only periodic recovery is used.
(2) Should the overlay configuration document indicate whether reactive
recovery is used or not in the overlay?

Page 104, second paragraph
“If peer N which is responsible for a Resource-ID R discovers that the
replica set for R (the next two nodes in its successor set)...”
Same comment as before: perhaps replica sets consisting of more than two
nodes shoud be allowed.

Page 104, Section “9.6.4.1. Updating Neighbor Table”
(1) “A peer MUST periodically send an Update request to every peer in
its Connection Table”
Should “Connection Table” be rather “Neighbor Table”? Sending to every
peer in the Connection Table means that an Update is sent also to the
fingers (i.e., at the very minimum to 3+3+16=22 peers, to all the
clients attached to the peer, and also to all of the peers that are in
the connection table but not in the routing table; such as peers that
are using this peer as a finger). This may cause too much traffic. Note
that the original Chord algorithm sends only an Update to the first
predecessor and to the first successor. I would rephrase this to
something like: “A peer MUST periodically send an Update request to all
or a subset of the peers in its neighbor table”
(2) “The default time is about every ten minutes.”
This is not sufficient in large-scale networks experiencing churn.
However, it may be sufficient in small-scale, low-churn overlays in
which reactive recovery is used. Therefore, I would modify this as
follows: “If reactive recovery is used, the default time is about every
ten minutes.”

Page 104-105, Section “9.6.4.2. Refreshing finger table”
(1) “A peer SHOULD NOT send Ping requests looking for new finger table
entries more often than the configuration element
“chord-reload-ping-interval”, which defaults to 3600 seconds (one per
hour)”.
This is not sufficient in large-scale networks experiencing churn.
However, it may be sufficient in small-scale, low-churn overlays in
which reactive recovery is used. Therefore, I would modify this as follows:
“A peer SHOULD NOT send Ping requests looking for new finger table
entries more often than the configuration element
“chord-reload-ping-interval”. If reactive recovery is used, this
defaults to 3600 seconds (one per hour).”
(2) Paragraph 4 on page 105 describing Alternative 2:
I would add a warning here stating that Alternative 2 should only be
used in small-scale, low-churn overlays because it can cause much more
traffic than Alternative 1. Alternative 1 is more appropriate for
large-scale overlays experiencing churn.

Page 112, last paragraph of Section 10.1.
“When a node receives a new configuration file, it MUST change its
configuration to meet the new requirements. This may require the node to
exit the DHT and re-join.”
Exiting and re-joining as a result of receiving a new configuration file
is potentially a huge problem. If the configuration file spreads rapidly
in the overlay, there is a short period of time during which all the
peers will exit and re-join. The resulting churn will most probably be
so high that the network will collapse.

Page 138, Section “15.2. Informative References”
[I-D.maenpaa-p2psip-self-tuning]: there is a newer version of the draft
available:
[I-D.maenpaa-p2psip-self-tuning]
Maenpaa, J., Camarillo, G., and J. Hautakorpi, "A Self- tuning
Distributed Hash Table (DHT) for REsource Location And  Discovery
(RELOAD)", draft-maenpaa-p2psip-self-tuning-01 (work in progress),
October 2009.

====
Nits
====

Node-ID is sometimes spelled as “Node-Id” and sometimes as “Node-ID”.
Further, sometimes "Peer-ID" or "peer ID" is used.

Page 10, Section “1.2. Architecture”
“RELOAD is fundamentally an overlay network”
I guess it would be better to say “The RELOAD network is fundamentally
an overlay network” to be consistent with the text in the previous
subsection.

Page 12, last paragraph
“the roles of various layer” => “the roles of various *layers*”

Page 13, the figure
There seem to be some indentation issues in the figure.

Page 25
“RELOAD does not support an intermediate peer returning a response that
it will not recursively route a normal request”
Is something missing from this sentence?

Page 35, Section “5.2.1 Request Origination”
“The simplest such destination list is a single entry containing the
peer or Resource-ID.” => “The simplest such destination list is a single
entry containing the *Node*-ID or Resource-ID.”

Page 43, Section 5.3.2.2
enum DestinationType uses term “node”, whereas the definition of “type”
on the following page uses “peer”.

Page 44, definition of “type”
Could you open the acronym PDU?

Page 58, Section “5.5.1. Attach”, first paragraph
“A node sends an Attach request when it wishes to establish a direct TCP
or UDP connection...”
Should this be “a direct TLS or DTLS connection...”?

Page 66, row 6
“Check the certificate receive in...” => “Check the certificate
*received* in...”

Page 74, first paragraph
“...if it is known that all nodes are are...” => “...if it is known that
all nodes *are*...”

Page 102, the description of the “fingers” field:
“The finger table if the Updating peer...” => “The finger table *of* the
updating peer...”

Page 111, definition of “multicast-bootstrap”
“This element represents the address of a multicast, boradcast...” =>
“This element represents the address of a multicast, *broadcast*...”

Page 116, second last paragraph of Section 10.3.
“...and have appropriate HTTP stats codes.” => “...and have appropriate
HTTP *status* codes.”

Page 124, Section “12.4. Certificate-based Security”
(1) “The user is also assigned one or more Node-IDs by the central
enrollment authority. Both the name  and the peer ID are placed in the
certificate,...” => “The user is also assigned one or more Node-IDs by
the central enrollment authority. Both the name and the *Node-ID* are
placed in the certificate,...”
(2) “o As a overlay peer with the peer ID(s)...” => “o As a overlay peer
with the *Node-ID(s)*...”

Page 137, Section “14. Acknowledgments”
“Hautakorp” => “Hautakorpi”

Page 143, “Appendix B. AIMD Retransmission Scheme”
“...and is based on the theAIMD...” => “...and is based on *the AIMD*...”


[1] Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz, "Handling churn
in a DHT", In Proc. of the USENIX Annual Techincal Conference June 2004.

[2] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.,
Dabek, F., and H. Balakrishnan, "Chord: A Scalable Peer-to-peer Lookup
Service for Internet Applications", IEEE/ACM Transactions on Networking
Volume 11, Issue 1, 17-32, Feb 2003.

[P2PSIP] Review of RELOAD base draft Jouni Mäenpää
Re: [P2PSIP] Review of RELOAD base draft Ari Keranen
Re: [P2PSIP] Review of RELOAD base draft Bruce Lowekamp
Re: [P2PSIP] Review of RELOAD base draft Ari Keranen
Re: [P2PSIP] Review of RELOAD base draft Jouni Mäenpää