Re: [v6ops] Review of draft-lmhp-v6ops-transition-comparison-01

Hi Gabor,

Please see inline below.

Thanks,
Ian

> On 12. Nov 2018, at 11:18, Lencse Gábor <lencse@hit.bme.hu> wrote:
> 
> Dear Ian Farrer,
> 
> Thank you very much for reviewing our draft.
> 
> Please see my answers inline. (I was too busy last week, and even now, I will be able to answer you only partially, I ask your patience regarding the rest parts.)
> 
> 11/8/2018 8:47 AM keltezéssel, ianfarrer@gmx.com <mailto:ianfarrer@gmx.com> írta:
>> Hi,
>> 
>> Thanks for tackling this task. It’s really useful to document the characteristics of the different mechanisms now there’s some real experience of them.
>> 
>> As promised in v6ops on Monday, here’s a review of the draft.
>> 
>> Thanks,
>> Ian
>> 
>> 2.1
>> 1, As it's mentioned that MAP-T and 4646XLAT use double NAT, it’s worth mentioning that DSLite, lw4o6 and MAP-E are single NAT44.
> 
> Thank you for mentioning it. I have added it in parenthesis not to break the train of thought as follows:
> DS-Lite, lw4o6 and MAP-E encapsulate the IPv4 packets into IPv6 packets (and use single NAT44 at the CE before doing encapsulation).
> 
[if - Well, they are all mechanisms that have a single NAT44 in them, but lw4o6 and MAP-E do NAT in the CPE, and DSLite has it in the AFTR]

> 
>> 
>> 2, Number of bytes overhead difference is certainly worth mentioning, but in practice, I'm not sure it's that much of a differentiator. IME 1500-bytes is the magic number for a lot of network's MTUs (especially for backhaul). If the network supports more than this then it's likely to be for at least 2k, so 1520 vs 1540 payload differences are unlikely to pose a real problem.
> 
> Yes, of course, except for rare cases, the different is only significant if it causes fragmentation.

[if - A more general point that’s worth making is that if you can’t backhaul the traffic without fragmentation (whatever the additional overhead), then really none of these solutions are a good fit for you. RFC7597, Sec 8.3.1 already states this, but as it’s a common consideration for all of these mechanisms, it’s worth emphasising.]

> 
>> 3, DSLite doesn't do double translation (RFC6333 sec 4.2). The B4 only does encapsulation of the privately addressed traffic into v6. The AFTR uses the unique v6 B4 source address to track the flows and does stateful NAT44.
> 
> Yes, you are right, it is explicitly stated in 4.2 that:
> 
>    A DS-Lite CPE SHOULD NOT operate a NAT function between an internal
>    interface and a B4 interface, as the NAT function will be performed
>    by the AFTR in the service provider's network.  This will avoid
>    accidentally operating in a double-NAT environment.
> 
> However,  a stateful NAT44 has to be done somewhere from the user's private address range to one of the addresses in 192.0.0.x/29. The next sentence of 4.2 is:
>    However, it SHOULD operate its own DHCP(v4) server handing out
>    [RFC1918 <https://tools.ietf.org/html/rfc1918>] address space (e.g., 192.168.0.0/16) to hosts in the home.
> 
> And above, you have suggested a correction which means the there is a NAT44 function in the DS-Lite CE device:
> 
> DS-Lite, lw4o6 and MAP-E encapsulate the IPv4 packets into IPv6 packets (and use single NAT44 at the CE before doing encapsulation).
> 
> So can you help me, where it is done?

[if - 
(In addition to Ole’s comment)

For DSLite, the AFTR is doing the stateful NAT44 function and the CPE is just acting as a router. The CPE is handing out private addresses to clients on its LAN side, but doesn’t need to NAT these - they just need to be unique for that customer’s home network (so that traffic can be routed by the CPE to the correct host). So the CPE takes the private addressed v4 traffic, encapsulates it and sends it to the AFTR.

What is unique at the AFTR is the source v6 address of the B4. When a customer v4 packet arrives, the AFTR does the NAT44 and this v6 B4 address is stored alongside the NAT44 in the NAT state table. This means many customers can source from e.g. 192.168.1.10 and return traffic can be routed back to the right CPE via the 4 in 6 softwire.] 

> 
>> 4, "Another consequence is that the solutions using double translation can carry only TCP, UDP or ICMP over IP, when they are used with IPv4 address sharing".
>> 
>> There are other protocols that have layer-4 ports (e.g. SCTP). In previous softwire docs, the term 'port-aware IP protocols' has been used, suggested re-word:
>> 
>> "Another consequence is that the solutions using double translation can only carry port-aware IP protocols (e.g. TCP, UDP) and ICMP when  they are used with IPv4 address sharing"
> 
> Thank you very much for pointing it out and also for suggesting a better wording. I have changed it as you recommended.
> 
>> 5, RFC8114 describes 4in6 multicast using encapsulation, so it would be worth mentioning this. Also, we've set up an lwAFTR with rules to perform encapsulation of IPv4 multicast to send to v6 multicast addresses (a subset of the RFC8114 functionality) and can confirm that this works with clients implementing IGMPv2/MLDv2 interworking. I haven't tested it, but I would assume that a MAP-E BR could be configured to do the same.
> 
> I would be happy to include your results. If you have already published them, we can cite them. 
> 
> Could you put down you experience in few paragraph or even a full (sub)section, which we could include? 

[if - I’ll propose some text for this. Are you thinking of including a section on handling multicast?]

> 
>> 
>> 3.1.1
>> 1, This states that XLAT without as dedicated translation /64 may have triple NAT (NAT44 stateful, NAT64 stateless in the CPE and NAT64 stateful in the PLAT). In section 2.1, DSLite with dual-NAT is 'the worst case’ (although please see above regarding this), but the triple NAT for 464XLAT is then described as less problematic due to the distribution of state to CPEs. This seems inconsistent in evaluation.
>> 
> 
> Yes, it was deliberately wrong. I have deleted the sentence: 
> "The worst case is DS-Lite, which is also doing double stateful translation (NAT44 at the CE and NAT44 at the AFTR). "
> 
>> 
>> 3.1.2
>> See comment about DSLite above.
>> 
> 
> Correction is pending. We should clarify where NAT44 is done on the CE side.
> 
>> 
>> 3.1.3
>> 1, lw4o6 uses the same PSID address construction as MAP, but does not use the 'general' MAP algorithm linking the v4 and v6 addresses.
> 
> Do you recommend it as a text to be added as is? If yes, please specify after which sentence. 

[if - I can propose some text for this.]

> 
>> 2, lw4o6 uses (by default) a single contiguous block of ports per client (a-bits=0). WKPs are excluded (if required) through provisioning. MAP (by default) uses a-bits=6, meaning that a client's total ports are distributed across the total port space, and the WKPs are excluded from allocation to any customer. One by product of this is that the netfilter problem (only using the first port set for NAPT, noted in section 3.2) is not experienced in a lw4o6 deployment.
>> 
>> 3, The last paragraph about allocating full IP addresses to clients is worth extending, as it adds some flexibility to the deployment. When a full address is allocated for a client, the lwAFTR functions as
>> in RFC7040 - i.e. it no longer performs A+P based routing for that client. This means that non port-aware protocols can be routed. As this can be configured on a per-client basis, if a customer reports a problem with a particular application, they can be re-provisioned with a full address.
>> 
>> 
>> 3.2 
>> 1, Port sizes for clients for MAP/lw4o6 are predetermined, but not fixed. A CGN's port block allocation policy is certainly easier to change, but this is not something that is done regularly:
>> lw4o6 clients are provisioned individually, so it is possible to change the port block size for single or multiple clients and distribute this to clients as quickly as their DHCPv6 gets renewed.
>> A MAP domain (and all of the clients covered in the domain) can be redimensioned to change the client's port space. Again, deploy time would be in line with DHCPv6 timers.
>> 
>> 2, I was unable to find a publicly available version of the Miy2010 reference. Can you provide a link to it in the references section?
> 
> Yes this is a problem. (As I am an IEICE member, and I could download the paper, this is why I was not aware of this problem.)
> 
> I have just asked the author (Shin Miyakawa) in a message through ResearchGate to share a public copy of his paper. I hope that he will do so soon. 
> 
> Meanwhile, I will send you a copy of it in a private mail in a few minutes. (I think I may not send it to the mailing list.)

[if - Thanks]

> 
> Best regards,
> 
> Gábor
> 
>> 3, Do you have any information about how the testing in Miy2010 was done, and specifically if the NetFilter 3-tuple connection multiplexing described in section 3.2 was used?
>> 
>> 
>> 3.3
>> 1, The scope of this section needs some further explanation, as it’s not clear about whether this is describing IPv4 or IPv6 servers with IPv4 or IPv6 clients. It’s also worth describing whether making the server externally accessible can be configured purely by the customer, and which would need configuration by the operator in centralised infrastructure.
>> 
>> 2, MAP & lw4o6 both always expose L4 ports, so it is possible to run externally IPv4 services to IPv4 clients on these. Generally, these will not be the WKP ports, unless those are being distributed for use by clients. But customer can configure a server to be run on a higher port (taken from the allocated block) then port translated (if needed) to e.g. 443.
>> 
>> 3, PCP would be worth including here.
>> 
>> 
>> 3.4 Support and implementations
>> 1, As NetFilter in the CPE is mentioned, It might be good to include information about FOSS operator side implementations as well. Off the top of my head:
>> 
>> MAP-BR, lwAFTR, CGN - VPP/fd.io <http://fd.io/>  https://gerrit.fd.io/r/#/admin/projects/ <https://gerrit.fd.io/r/#/admin/projects/>
>> lwAFTR - https://github.com/Igalia/snabb <https://github.com/Igalia/snabb>
>> DSLite AFTR https://www.isc.org/downloads/ <https://www.isc.org/downloads/>
>> 
>> 
>> 3.6 Load Sharing
>> 1, This section needs to discuss more about state table synchronisation between cluster members for stateful solutions as this is the major bottleneck in scaling such solutions (and whether they can support asymmetric v4
>> and v6 paths through the cluster).
>> 
>> 2, Not sure if this is the right section, but there is also a consideration about geographical resilience for CGN deployments (i.e. in a 1+1 resilience model, the size of public IPv4 address pools need to be scaled to allow for handling all customer traffic in event of a site failure). Stateless solutions can be more efficient here as they can have a common IPv4 range accessible via multiple geographical locations.
>> 
>> 
>> 3.7 Logging
>> 1, I think that a lot more needs to be said on this topic. It also needs to be weighed against points made in other parts of the document. E.g., in section 3.2, yes you can use the dynamic nature of port allocation
>> in a CGN to allow for more flexible port allocations, but there is a cost in logging overhead. Having first hand experience of a network that does CGN per-flow logging to comply with local legal requirements, this can be a major consideration. In the total cost of the solution, the CGN infrastructure was a fraction of the logging and storage costs.
>> 
>> 
>> In addition, there are a couple of additional topics that the document should discuss:
>> IPv6 and IPv4 address planning / topology independence
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> v6ops mailing list
>> v6ops@ietf.org <mailto:v6ops@ietf.org>
>> https://www.ietf.org/mailman/listinfo/v6ops <https://www.ietf.org/mailman/listinfo/v6ops>
> 
> _______________________________________________
> v6ops mailing list
> v6ops@ietf.org
> https://www.ietf.org/mailman/listinfo/v6ops