Re: [Int-area] AD evaluation: draft-ietf-intarea-nat-reveal-analysis

Brian Haberman <brian@innovationslab.net> Wed, 13 February 2013 15:03 UTC

Return-Path: <brian@innovationslab.net>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D965521F87A6 for <int-area@ietfa.amsl.com>; Wed, 13 Feb 2013 07:03:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.566
X-Spam-Level:
X-Spam-Status: No, score=-102.566 tagged_above=-999 required=5 tests=[AWL=0.033, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JtxckrwdMqC0 for <int-area@ietfa.amsl.com>; Wed, 13 Feb 2013 07:03:57 -0800 (PST)
Received: from uillean.fuaim.com (uillean.fuaim.com [206.197.161.140]) by ietfa.amsl.com (Postfix) with ESMTP id 09DC721F8718 for <int-area@ietf.org>; Wed, 13 Feb 2013 07:03:57 -0800 (PST)
Received: from clairseach.fuaim.com (clairseach-high.fuaim.com [206.197.161.158]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by uillean.fuaim.com (Postfix) with ESMTP id 6553D88130; Wed, 13 Feb 2013 07:03:54 -0800 (PST)
Received: from 102526165.rudm1.ra.johnshopkins.edu (addr16212925014.ippl.jhmi.edu [162.129.250.14]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by clairseach.fuaim.com (Postfix) with ESMTP id D767A13000C; Wed, 13 Feb 2013 07:03:53 -0800 (PST)
Message-ID: <511BAB5B.8010702@innovationslab.net>
Date: Wed, 13 Feb 2013 10:03:55 -0500
From: Brian Haberman <brian@innovationslab.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: mohamed.boucadair@orange.com
References: <51195E93.4090103@innovationslab.net> <94C682931C08B048B7A8645303FDC9F36EAEE11CD9@PUEXCB1B.nanterre.francetelecom.fr>
In-Reply-To: <94C682931C08B048B7A8645303FDC9F36EAEE11CD9@PUEXCB1B.nanterre.francetelecom.fr>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "draft-ietf-intarea-nat-reveal-analysis@tools.ietf.org" <draft-ietf-intarea-nat-reveal-analysis@tools.ietf.org>, "int-area@ietf.org" <int-area@ietf.org>
Subject: Re: [Int-area] AD evaluation: draft-ietf-intarea-nat-reveal-analysis
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/int-area>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Feb 2013 15:03:59 -0000

On 2/12/13 5:34 AM, mohamed.boucadair@orange.com wrote:
> Dear Brian,
>
> Many thanks for the detailed review.
>
> Please see inline.
>
> Cheers, Med
>
>> -----Message d'origine----- De : int-area-bounces@ietf.org
>> [mailto:int-area-bounces@ietf.org] De la part de Brian Haberman
>> Envoyé : lundi 11 février 2013 22:12 À : int-area@ietf.org;
>> draft-ietf-intarea-nat-reveal-analysis@tools.ietf.org Objet :
>> [Int-area] AD evaluation: draft-ietf-intarea-nat-reveal-analysis
>>
>> All, I have completed my AD evaluation for the above draft and
>> have some feedback for the group.  I will focus on the substantive
>> comments for the time being since some of them may result in
>> re-written text in places.  I will follow up with the document
>> authors on editorial nits and such at a later time.
>>
>> 1. It is obvious from the way certain sections of text are written
>> that the original intent was to make a recommendation on which of
>> the described approaches should be used to disambiguate between
>> multiple hosts behind a NAT/CGN.  Given that the document is now
>> simply a characterization of those mechanisms, I would suggest
>> spending some time cleaning up the Abstract, Section 1.1, and
>> Section 2 so that they focus on the task of describing the
>> mechanisms, rather than mentioning abstract requirements for those
>> mechanisms. There are concrete suggestions a little later in this
>> note.
>
> Med: It is true there was a version of the document which includes a
> recommendation but the initial intent of the document was to analyze
> candidate solution. I updated the text to make it explicit: I changed
> the text in the sections you mentioned. In particular, the change in
> the abstract is:
>
> OLD:
>
> This document analyzes a set of solution candidates to mitigate some
> of the issues encountered when address sharing is used.  In
> particular, this document focuses on means to reveal a host
> identifier (HOST_ID) when a Carrier Grade NAT (CGN) or application
> proxies are involved in the path.  This host identifier must be
> unique to each host under the same shared IP address.
>
> NEW:
>
> This document is a collection of solutions to reveal a host
> identifier (denoted as HOST_ID) when a Carrier Grade NAT (CGN) or
> application proxies are involved in the path.  This host identifier
> is used by a remote server to sort out the packets by sending host.
> The host identifier must be unique to each host under the same
> shared IP address.
>
> This document analyzes a set of solution candidates to reveal a host
> identifier; no recommendation is sketched in the document.
>

The above text is a good addition to the document.

>>
>> 2. The mechanisms described in this draft fall into two broad
>> categories, deployed and proposed. Those in the former category can
>> be characterized based on actual usage scenarios, which would
>> benefit the table shown in Figure 3. The latter should be described
>> in terms of what they are proposed to do, but cannot be assessed
>> against the same metrics as the other groups.
>
> Med: Figure 3 includes this note:
>
> (2)  This solution is widely deployed.
>
> Can you explicit what change you want to be added? Thanks.
>

Isn't the IP-ID approach also used in some deployments?

I would suggest re-ordering the table so that deployed approaches are 
collected together (and labeling them as deployed).  If the HTTP 
Forwarded header is the only deployed approach, I would simply add a 
note for the others stating whether they are a documented proposal or a 
theoretical construct.

>>
>> 3. The "Requirements Language" section should be removed.  As an
>> Informational document describing mechanisms, there is no need to
>> leverage 2119 keywords.
>
> Med: Done.
>

Ok.

>>
>> 4. It would be useful if the third paragraph of Section 1.1 was
>> expanded to discuss the risks in more detail.  In fact, it may be
>> clearer to understand this draft if the problem statement came
>> before the context (Section 2).
>
> Med: I changed the text as follows:
>
> OLD:
>
> The sole use of the IPv4 address is not sufficient to uniquely
> distinguish a host.  As a mitigation, it is tempting to investigate
> means which would help in disclosing an information to be used by
> the remote server as a means to uniquely disambiguate packets of
> hosts using the same IPv4 address.
>
> NEW:
>
> In particular, some servers use the source IPv4 address as an
> identifier to treat some incoming connections differently.  Due to
> the deployment of CGNs (e.g., NAT44 [RFC3022], NAT64 [RFC6146]),
> that address will be shared.  In particular, when a server receives
> packets from the same source address, because this address is
> shared, the server does not know which host is the sending host
> [RFC6269]. The sole use of the IPv4 address is not sufficient to
> uniquely distinguish a host.  As a mitigation, it is tempting to
> investigate means which would help in disclosing an information to be
> used by the remote server as a means to uniquely disambiguate packets
> of hosts using the same IPv4 address.
>

That sounds better.

>>
>> 5. Section 2
>>
>> * The Observation text should provide some brief examples of how
>> and why special treatment is needed/provided.
>
> Med: I updated the text with an example:
>
> Policies relying on source IP address which are enforced by some
> servers will be applied to all hosts sharing the same IP address. For
> example, blacklisting the IP address of a spammer host will result in
> all other hosts sharing that address having their access to the
> requested service restricted.  [RFC6269] describes the issues in
> detail.  Therefore, due to address sharing, servers need an extra
> information than the source IP address to differentiate the sending
> host.  We call HOST_ID this information.
>

Ok.

>
> Is it sufficient to
>> identify the sending host? application? user?
>
> Med: I added this sentence:
>
> HOST_ID does not reveal the identity of a user, a subscriber or an
> application.
>

That should suffice.

>
> It should also note that there may be
>> issues with the fact that some IP addresses will be shared and
>> others may not.  How does that impact the performance of these
>> mechanisms?
>
> Med: The document assumes the address sharing function injects the
> host identifier. BTW, there is already a performance criterion listed
> in Figure 3.
>

I was thinking more generically than the performance criterion.  Suppose 
a server employs the IP-ID approach.  If several packets arrive with the 
same source IP address and the same value in the IP-ID field, there is 
no way to know if the IP-ID value was injected by a NAT/CGN box.  Or is 
your response saying that scenario is covered by the metrics used in 
Figure 3?  If so, which metric?  None of the descriptive text in Section 
5 talks about this type of issue.

>>
>> * I would like some text in the Objective text to explain why such
>> sorting is needed.  This relates back to the Context description
>> in Section 1.1.
>
> Med: The new text is:
>
> Policies relying on source IP address which are enforced by some
> servers will be applied to all hosts sharing the same IP address. For
> example, blacklisting the IP address of a spammer host will result in
> all other hosts sharing that address having their access to the
> requested service restricted.  [RFC6269] describes the issues in
> detail.  Therefore, due to address sharing, servers need an extra
> information than the source IP address to differentiate the sending
> host.  We call HOST_ID this information.
>

Ok.

>>
>> * I don't think there needs to be a description of a Requirement in
>> this document any more, so that text can be removed.
>
> Med: Done.
>

Ok.

>>
>> 6. Section 3.1 should be removed.  This is simply an analysis of
>> the mechanisms, so there is no new work which needs requirements
>> defined at this point.
>
> Med: Section 3.1 was added as a result of a review from privacy
> people. I do think it is useful to maintain it. Perhaps, move the
> text to the security considerations?
>

If anything, these are privacy considerations that may be impacted by 
these types of functions.  They can't be requirements at this point. 
Keeping the text is a good idea in that light, but don't call them 
Requirements.  Moving them to the Security Considerations section would 
work.

>>
>> 7. In Section 4.1.2, it would be good to describe any issues that
>> the approach has with the original use of the Identification field
>> for fragmentation reassembly.  If a middlebox changes the ID field,
>> weird things can/will happen if those packets are fragmented
>> somewhere.
>
> Med: We thought having a reference to
> draft-ietf-intarea-ipv4-id-update (now RFC6864) is sufficient. The
> impact of Middleboxes is already discussed in that document (see
> section 5.3).
>

So maybe the way to clarify this is to re-word the text in 4.1.2.  How 
about:

OLD:
This usage is not compliant with what is recommended in
    [I-D.ietf-intarea-ipv4-id-update].

NEW:
This usage is not consistent with the fragment reassembly use of the 
Identification field [RFC791] or the updated handling rules for the 
Identification field [I-D.ietf-intarea-ipv4-id-update].

>>
>> 8. I don't see a need for a forward reference in Section 4.2.2. I
>> would suggest simply stating that the IP Option approach will
>> support any/all transport protocols.
>
> Med: Done.
>

Ok.

>
>>
>> 9. In Section 4.3.2...
>>
>> * I would like to see some description of what risk(s) may arise
>> with a TCP option, even though they are apparently low
>> probability.
>
> Med: The main risk we had in mind is session failure due to handling
> an unknown TCP option. Are you suggesting this text should be
> expanded?
>
> The risk related to handling a new TCP Option is low as measured in
> [Options].
>

It would be good to mention at least one risk, like session failure, in 
the text to give the readers some clue as to the type of risks being 
considered.

>>
>> * Additionally, the text contains "0,103%", which I assume should
>> be "0.103%" (i.e., 1/10th of 1%).
>
> Med: Fixed. Thanks.
>

Ok.

>>
>> *The third bullet mentions that having several NATs in the path
>> may cause issues for a TCP option.  Isn't this true for other
>> approaches discussed in the document?  These should be identified
>> as well.
>
> Med: There are some proposals (e.g., XFF, Forward-For) which allow to
> prepend several host-ids. This is already mentioned in the text:
>
> When several address sharing devices are crossed, XFF/Forwarded-For
> header can convey the list of IP addresses (e.g., Figure 1).  The
> origin HOST_ID can be exposed to the target server.
>
> For some proposals (e.g., IP Option), this point is not mentioned as
> the analysis shows these proposals are a no starter.
>
> For the TCP option, the loss of the original host_id may not be a
> problem as the target usage is between proxies of a CGN and server.
> Only the information leaked in the last leg is likely to be useful.
>

Ok.  I can see how this is covered.

>>
>> 10. In Section 4.5.1, I would suggest adding some text that
>> describes how to interpret Figure 2.
>
> Med: Done.
>

Ok.

>>
>> 11. Is Section 4.6 theoretical or is there a specific reference
>> that can be added for this technique?
>
> Med: Added a ref to RFC6346.
>

Ok.

>>
>> 12. Section 4.7.2 should clearly state that HIP is an ideal
>> solution for this identification problem, even though the document
>> states there is a high cost for deployment. I would also like to
>> see some description of why HIP does not work if "the address
>> sharing function is required to act as a UDP/TCP-HIP relay".
>
> Med: The current text says:
>
> "If the address sharing function is required to act as a UDP/TCP-HIP
> relay, this is not a viable option."
>
> This require ALL servers in the Internet are HIP-enabled. It is
> obvious this is not a viable option for a deployable solution.
>

That is understood.  It is not clear why the "UDP/TCP-HIP relay" aspect 
is mentioned.  Is there something special about that deployment model 
that has additional issues (other than needing all servers to understand 
HIP)?

>>
>> 13. Section 4.8.2
>>
>> * The text says that the ICMP approach is viable for TCP and UDP.
>> Any reason why it may be an issue for other transport protocols
>> (e.g., SCTP or RTP)?
>
> Med: The ICMP approach can work for any transport protocol making use
> of a port number. We mentioned TCP and UDP as these are the widely
> deployed transport protocol. I updated the text as follows:
>
> OLD:
>
> o  This ICMP proposal is valid for both UDP and TCP.  Address
> sharing function may be configurable with the transport protocol
> which is allowed to trigger those ICMP messages.
>
> NEW:
>
> o  This ICMP proposal is valid for any transport protocol that uses
> a port number.  Address sharing function may be configurable with the
> transport protocol which is allowed to trigger those ICMP messages.
>

That works.

>
>>
>> * I would also like to see some text describing why the approach is
>> not compatible with cascading NATs.
>
> Med: The main reason is that each NAT in the path will generate an
> ICMP message. These messages will be translated by the downstream
> NATs. The remote server will receive multiple ICMP messages and will
> need to decide which host identifier to use.
>

The above text, or something similar, should be added to that bullet.

>>
>> * The last bullet mentions FMC and Open WiFi with no context or
>> references.  These should either have references or their mention
>> should be removed since they don't add much to the description.
>> The same goes for their mention in Section 4.9.2 (8th bullet).
>
> Med: I updated the text with a reference to a document where the
> problem is described in detail:
>
> OLD:
>
> o  In some scenarios (e.g., Fixed-Mobile Convergence, Open WiFi,
> etc.), HOST_ID should be interpreted by intermediate devices which
> embed Policy Enforcement Points (PEP, [RFC2753]) responsible for
> granting access to some services.  These PEPs need to inspect all
> received packets in order to find the companion (traffic) messages to
> be correlated with ICMP messages conveying HOST_IDs.  This induces
> more complexity to these intermediate devices.
>
> NEW:
>
> o  In some scenarios (e.g., Section 3 of
> [I-D.boucadair-pcp-nat-reveal]), HOST_ID should be interpreted by
> intermediate devices which embed Policy Enforcement Points (PEP,
> [RFC2753]) responsible for granting access to some services. These
> PEPs need to inspect all received packets in order to find the
> companion (traffic) messages to be correlated with ICMP messages
> conveying HOST_IDs.  This induces more complexity to these
> intermediate devices.
>
> I updated also the text in Section 4.9.2.
>

Ok.

>>
>> 14. In Section 4.9.2 (3rd bullet), is the solution to publish this
>> info in DNS or is that just an example approach?  This should be
>> clarified.
>
> Med: DNS is mentioned as an example. I updated the text as follows;
>
> OLD:
>
> o  A hint should be provided to the ultimate server (or intermediate
> nodes) the address sharing function implements IDENT protocol. This
> can be achieved by publishing this capability using DNS.
>
> NEW:
>
> o  A hint should be provided to the ultimate server (or intermediate
> nodes) the address sharing function implements IDENT protocol.  A
> solution example is to publish this capability using DNS; other
> solutions can be envisaged.
>

Ok.

>
>>
>> 15. Section 5
>>
>> * Shouldn't there be an additional metric that covers the
>> impact/cost of needing client or middlebox code changes?
>
> Med: For almost all solutions, the host identifier is not injected by
> the client. Host_id injection is done by an address sharing
> function. The cost of the change in the address sharing will depend
> on the capabilities supported by that device: a NAT device re-writing
> packets can inject (in theory) L3/4 information without extra cost
> but inspecting packets to inject application-related header would
> require new features. We focused on the expected performance impact
> rather than the expect induced cost.
>

Ok.

>>
>> * Where did the 100% success ratio for IP-ID come from?  There have
>> been documented cases of OSes setting the Identification field to
>> zero.  If that is true, the success ratio can't be 100% can it?
>
> Med: the IP-ID tweaking is implemented in the address sharing
> function not the host/OS. In theory, if the address sharing functions
> follows the rule for IP-ID field, failure is unlikely.
>

Even in the case where packets are fragmented after the middlebox sets 
the IP-ID?  It seems that the success ratio ignores those types of 
errors.  Are those errors counted in the "Possible Perf Impact" metric?

>>
>> * Given the goal of this document to describe these identification
>> mechanisms, I don't see the need for the last bulleted list.
>
> Med: The intent of that text is to provide a kind of conclusion. No
> problem to remove it if you think so.

I would prefer that type of discussion be done as prose, rather than a 
list.  I will not object if the authors want to leave it as a list.

I do have one other issue...

The discussion in 4.4.1 inter-mixes two different HTTP headers.  The XFF 
header is now obsolete (RFC 6648).  It has been replaced by the 
Forwarded: header defined in the referenced draft.  Figure 1 uses the 
correct header name, but the supporting text references XFF in several 
places.  All uses of XFF should be replaced by Forwarded: to be 
consistent with the current specs.

Regards,
Brian