Problem statement: Traceroute doesn't work across encapsulation boundaries. That is it does not report any information about router hops between an encapsulator and decapsulator of a tunneled connection. A further complication is if the encapsulation is of a different IP version than the payload, as the ICMP time/hop count exceeded received by the encapsulator is of a different address family than the one it needs to send back to the source of the traceroute. This is a general issue for any tunneling/encapsulation technique. An example is LISP where an ITR encapsulates packets to an ETR, but it is desirable for user traceroutes to show the intermediate hops between the ITR and the ETR, as discussed here http://tools.ietf.org/html/draft-ietf-lisp-08#page-56 For the rest of this document, I will talk in the terms of LISP, but the issues apply to any IP tunneling/encapsulation technique. Making traceroute work across encapsulation boundaries has a series of challenges I have tried to summarize below. We feel this may be a good forum to start discussing how to address them. An encapsulator (Ingress Tunnel Router or ITR in LISP speak), can copy the TTL/hop count from the payload to the added IP encapsulation, this means the hops between the encapsulator (ITR) and decapsulator (Egress Tunnel Router or ETR in LISP speak) will decrement the traceroute probe packet TTL to 0, and generate ICMP TTL/hop count exceeded messages back to the source, that is the encapsulator (ITR). The following section describes the issues the ITR has in translating the received ICMP TTL/hop count exceeded into another ICMP TTL/hop count exceeded that the end station doing the traceroute can process. For IPv6 payload with IPv6 transport, RFC443 section 2.4 mandates that the ICMP hop count exceeded must contain the entire user packet, capped by the ICMP message being <= 1280 bytes. So this one is relatively simple to handle, the ITR can simply strip the LISP encap from the payload of the ICMP hop count exceeded, and then proxy it back to the end user. The source of the new message must be spoofed to be the one of the core router that decremented TTL to 0. An issue with this, is that we violate the rule saying the ICMP must contain the entire user packet capped by 1280 bytes as we lost some bytes by stripping the payload - but that seems like a minor issue. For IPv4 payload with IPv4 transport, RFC792 only mandates 64 bits/8 bytes of user payload to be included in the ICMP time exceeded sent back to the ITR. Those 8 bytes will be part of the encapsulation added by the ITR (in the case of LISP it is the UDP header), and we have no way of generating an ICMP time exceeded back to the end user, as we do not even have the inner IPv4 header of the original packet. For IPv4 payload with IPv6 transport, the ITR will receive an IPv6 ICMP hop count exceeded, and must generate an IPv4 time exceeded back to the end user. The new message needs to convey the source IPv6 address of the core router that decremented TTL to 0. The can possibly be achieved via an ICMP extension similar to how MPLS did in RFC4950, where the ICMP extension carries the label information. For IPv6 payload with IPv4 transport, the ITR will receive an IPv4 time exceeded and must generate an IPv6 hop count exceeded, here we have the same issue as IPv4 in IPv4 where we do not have enough of the original packet to know who the end user was. Additionally we have the problem on how to convey information about the source IPv4 address of the core router that decremented TTL to 0. The IPv4 address of the core router can be conveyed to the IPv6 end host using RFC2373 (deprecated by RFC4291) style "IPv4-compatible IPv6 addresses" or a similar method. The draft-ietf-lisp-08 draft has a suggestion in section 9.2 on how to address the IPv4 in IPv4 case, but it suffers from significant problems - The signature of a traceroute probe is at best very loosely defined, so the ITR will either not be able to identify some probes, or it will treat too many user packets as traceroute probes, potentially leading to resource starvation. - Implementing this algorithm in a forwarding engine is difficult at best and if you punt packets identified as probes to software forward you risk out of order delivery and/or overwhelming the software forwarding plane. We have got an alternative algorithm which doesn't require the forwarding plane to specialy treat traceroute probes. It instead relies on the fact that the ITR is in the path of the traceroute, and thus itself decrements TTL to 0 for some of the probes. The ITR can record information about these, which it can use to construct fake ICMP time exceeded messages back to the end user when the ITR receives an ICMP time exceeded from the core. This method relies on guessing how the client varies fields in the probes like UDP port numbers for UDP based traceroute, or ICMP sequence numbers for ICMP based traceroutes. All we currently got are various degrees of hacks. It seems the only way to solve the problem of having to fake data, is if the IPv4 time exceeded ICMP message is changed. At a minimum it must contain enough of the user packet to preserve the L3+L4 payload and any encapsulation added by the encapsulator. For LISP that means at least 64 bytes, 8 byte UDP + 8 byte LISP headers + 40 byte IPv6 IH + 8 byte user payload. I would propose text similar to IPv6 (RFC443 section 2.4), stating the ICMP message must contain as much of the invoking packet as possible, without making the ICMP packet larger than 576 bytes. This will allow us to use an algorithm like the IPv6 in IPv6 one. For the cross-AF traceroutes, we'll need to standardize a way to convey addresses of an AF different than the header, possibly via an ICMP extension similar to the MPLS one in RFC4950.