[dhcwg] Comments on draft-ietf-dhc-dhcpv6-failover-design-02
Marcin Siodelski <msiodelski@gmail.com> Tue, 12 March 2013 19:43 UTC
Return-Path: <msiodelski@gmail.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E2451F0C36 for <dhcwg@ietfa.amsl.com>; Tue, 12 Mar 2013 12:43:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VT474HXftS86 for <dhcwg@ietfa.amsl.com>; Tue, 12 Mar 2013 12:43:12 -0700 (PDT)
Received: from mail-pb0-f51.google.com (mail-pb0-f51.google.com [209.85.160.51]) by ietfa.amsl.com (Postfix) with ESMTP id 055A111E8121 for <dhcwg@ietf.org>; Tue, 12 Mar 2013 12:43:10 -0700 (PDT)
Received: by mail-pb0-f51.google.com with SMTP id un15so208223pbc.10 for <dhcwg@ietf.org>; Tue, 12 Mar 2013 12:43:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:subject:from:to:cc:date:content-type:x-mailer :mime-version:content-transfer-encoding; bh=/GHw/xMD1mgg6kXyCi0eAGZqou2NwOyfrNSj6tsXD+I=; b=ajRwWYfRW2u1JLcd9vW4O82G5pQF9mCIdkkS/2gv5S2KJ32/8AOkx61VI5nbjI2T8k yvP7I5EdgC7FL4+MuveIUazrjw5Frs/AFs0ERdl+ZBYetT5wJJpa6RWnWyu/7vGMo6dI yZZnEGFwa3QqNi0X4dV/0yRN/Om1wRJd3OdK4FXBJM0pe277jzjK7h8NON5BifW/87br W+t4W/v6gujIN0TDjwcVLo8+e0WerfRqXZdJQ9qkjo2kBE6DTeBlicVqym6FwAL6noQf hXzUju+pUMSr/Gj6EaUtG7TOc2IrQZtPxnLxBscFk1RWs51LEk+WvyDS1gYKYklztpnV y00g==
X-Received: by 10.68.213.66 with SMTP id nq2mr39265464pbc.29.1363117390388; Tue, 12 Mar 2013 12:43:10 -0700 (PDT)
Received: from ?IPv6:2001:df8:0:128:120b:a9ff:fe7b:e3f0? ([2001:df8:0:128:120b:a9ff:fe7b:e3f0]) by mx.google.com with ESMTPS id iu10sm26263629pbc.13.2013.03.12.12.43.06 (version=SSLv3 cipher=RC4-SHA bits=128/128); Tue, 12 Mar 2013 12:43:08 -0700 (PDT)
Message-ID: <1363117384.2123.64.camel@marcin-lenovo>
From: Marcin Siodelski <msiodelski@gmail.com>
To: "dhcwg@ietf.org" <dhcwg@ietf.org>
Date: Tue, 12 Mar 2013 15:43:04 -0400
Content-Type: text/plain; charset="ISO-8859-15"
X-Mailer: Evolution 3.6.3 (3.6.3-2.fc18)
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Cc: kkinnear@cisco.com
Subject: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-failover-design-02
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Mar 2013 19:43:13 -0000
Hi, I have read the version -02 of the draft. This is a useful work and I support it move forward. Some first round of comments... 2. Glossary: - The "Failover transmission" is defined in the glossary but it is used nowhere in the draft. Perhaps you meant "Failover communication" which is common in the text. 3. Introduction - The document says that the failover protocol is not suitable for lease times shorter than 30sec. Then, is it suitable for lease times of 31s? Why is that arbitrary number selected? I believe that the document should simply mention that there is an additional overhead on the server related to failover transmission (that would justify use of this term in the glossary) and thus it is not efficient for short lease times. 3.1. Perhaps, it is worth to replace the title from "Additional Requirements" with "Design Requirements" to indicate that it is not extending the protocol requirements described elsewhere. 4. Protocol Overview - The hyperlink to "Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460]" is broken because clicking on "Section 5.1" takes you to the section 5.1 in the local document. - Is it worth to mention that if the primary server can't establish communication with the secondary server (because secondary is down) it will attempt to connect with the implementation dependent interval? 4.1. Failover State Machine Overview - It is not clear what the state the partner goes to, when the DISCONNECT occurs. I presume it is PARTNER-DOWN state but the document should clarify it. - Here is a sentence: "In case of a disagreement between the simplified and complete description, please follow Section 9". I think that there must be no "disagreement" between those two sections. The one provides a supplementary information for the other but they should remain consistent in the common part. - I don't understand the purpose of: "It frequently returns back to the state it was in before shutdown". Perhaps, some context needs to be provided for it? - In the 4th paragraph "in unresponsive" should be replaced with "is unresponsive". However, I am wondering if it is correct given that extensions may introduce "active-active" mode in which this is not true anymore. Perhaps, it could be mentioned that "assuming that failover pair is in the active-passive mode, the secondary server is unresponsive when in NORMAL state". - A bit of clarification would be needed on what the "auto-partner-down" is - glossary? Is this that the server bypasses the COMMUNICATION-INTERRUPTED state and goes straight to PARTNER-DOWN or it goes to COMMUNICATION-INTERRUPTED state then to PARTNER-DOWN? If the latter is the case, then would it state in the COMMUNICATION-INTERRUPTED state for a while, trying to reconnect? - I don't understand what is meant by "When a server does not have an intact lease state database (e.g. due to first time run or catastrophic failure) ..."? First of all, is the lease state data base the database which holds leases or it holds some additional information about the server state prior to the failure, which can be used to determine that the catastrophic failure occured? Also, the "doesn't have an intact lease state database" formulation is misleading because I think that the server on its first time DOES have the intact database. - The RECOVER-WAIT to RECOVER-DONE transition could be better described. Specifically, what condition it is triggered by? - Is there any way out from the RESOLUTION-INTERRUPTED? If the communication is reestablished I presume that both servers continue resolving conflicts? 4.2. Messages - I suggest that the following description of the CONNECT message: "The partner is expected to confirm by responding with CONNECTACK" is reworded to "The partner MUST confirm by responding with CONNECT ACK". - Perhaps, the same applies to some other types of messages but I did not check that thoroughly enough to put this into this review. - Minor: it would be easier to search for the specific message description if the messages were ordered alphabetically. - The POOLRESP is used everywhere throughout the document, while it is called "POOLRSP" in the list of messages. 5.1. Creating Connections - "If it has no secondary relationships with the connecting server, it SHOULD drop the connection". This sentence suggests that the secondary server will not respond to the initiating server if it has no relationship with it. A couple of paragraphs earlier, it was said the server "is expected" or MUST respond with CONNECTACK. I suggest that it is reworded to explain that the server MUST reject the connection by sending the CONNECTACK with the reason for rejection instead of "dropping". 6.1. Proportional Allocation - The paragraph which starts with "The initial allocation when..." is confusing. If I understand correctly, the POOLRESP is devoted to deliver the actual resources to the secondary server. The document says that it is to inform a secondary server "how many resources it allocated". This doesn't sound to be the same. - I am wondering why at all, the POOLREQ/POOLRESP exchange has to be initiated by the secondary server. In the scenario where the failover configuration is handled through the primary server it is primary server who should notify the secondary that the (for example) pool balance has been changed. This implies the primary could initiate the exchange by notifying the secondary server that the rebalancing will start now. On the other hand in the "Proportional Allocation" scenario the rebalancing may be started when the secondary has returned addresses to the primary which had been released by the client. However, in such case the primary should also be aware of the need to rebalance and could also start the rebalancing procedure. So again, I don't understand why the secondary is starting the rebalancing procedure here? - Another related question: is this really required to do an extra POOLREQ/POOLRESP exchange while the actual transfer of addresses is later done via BNDUPD/BNDACK? What extra information will be carried in the POOLREQ/POOLRESP messages that needs to be delivered prior to sending BNDUPD/BNDACK which actually delegate resources by modifying state FREE to FREE_BACKUP or the other way around? - "...and the secondary server is responsible for being configured in such a way". Well, the server is not responsible for anything. Administrator might be responsible to configure the server, the developer is responsible to ensure the proper implementation, but the server is not responsible... 6.2. Independent Allocation - It is mentioned that the resources are allocated/split as a part of initial connection establishment. How? Specifically, are resources delegated to the secondary server using the same messages as used for "Proportional Allocation"? If so, what happens if the primary server requests reallocation of resources later on? Is this rejected by the primary server? Is this request dropped? 6.3. Choosing Allocation Algorithm - Typo in "indepentent". 7. Information Model - Instead of referring to "DHCP RFC" the link to the actual RFC along with its number should be provided. - Typo in "untile" 8. Failover Mechanisms - The last sentence is redundant as the previous one says pretty much the same. 8.4. MCLT Concept - The paragraph which starts with "The fundamental relationship which..." is a little bit obscure. I suggest that it is reworded to something like: "The fundamental relationship on which much of the correctness of this protocol depends is that the lease expiration time known to DHCPv6 client MUST NOT be greater by more than MCLT, than the potential expiration time known to a server's partner. 8.10. Acknowledging Reception - This section is empty. At least TBD would be useful. Otherwise it suggests that this part has been accidentally omitted. 9.1. State Machine Operation - One additional event that may cause transition from one state to another is probably administrator's intervention? 9.3.2. Transition Out of Startup State - The part that describes Step 2 is intricate. I read it a couple of times and I still don't feel I understand it. Perhaps, it could be split into a couple of simpler sentences. 9.4. PARTNER-DOWN State - It may be worth to give a hint how the server may end-up being in this state. Most likely it is because his partner had stopped responding. However, in this case they are in the COMMUNICATION-INTERRUPTED state and there is some other event (what?) that needs to occur to transition to the PARTNER-DOWN state. 9.11. RESOLUTION-INTERRUPTED - In the introduction to this chapter it is pointed out that if the server's remained in the POTENTIAL-CONFLICT state and the resolution of the conflict was interrupted, they transition to RESOLUTION-INTERRUPTED and they are unresponsive to clients (I presume this is because they were unresponsive when being in POTENTIAL-CONFLICT state and the conflict is still assumed to be present). If you take a look at the 9.11.1. it is said that the server MUST respond to all DHCP client requests. That looks contrary to the previous statement. My understanding is that the server can't respond in the RESOLUTION-INTERRUPTED state thus it requires administrative intervention. Cheers, Marcin
- [dhcwg] Comments on draft-ietf-dhc-dhcpv6-failove… Marcin Siodelski
- Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-fai… Kim Kinnear
- Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-fai… Marcin Siodelski
- Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-fai… Kim Kinnear
- Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-fai… Kim Kinnear