Re: [mile] FW: IDN issue

Hi Peter,

I'll try to respond in line below and let David fill in any gaps.

Thanks,
Kathleen

-----Original Message-----
From: mile-bounces@ietf.org [mailto:mile-bounces@ietf.org] On Behalf Of Peter Saint-Andre
Sent: Thursday, January 26, 2012 3:05 PM
To: Black, David
Cc: mile@ietf.org; presnick@qualcomm.com
Subject: Re: [mile] FW: IDN issue

Hi David,

On 1/25/12 5:06 PM, david.black@emc.com wrote:
> Hello Peter,
> 
>> In short, there be dragons.
> 
> Indeed (as I'm entirely too well-aware of from the précis WG), however, the goal
> here is to find a suitable cave in which to confine the dragons :-).

Nice way to put it. And yes, the PRECIS WG is a fun place to hang out.
Speaking of which, how's that iSCSI profile of the precis framework
coming along? ;-)

> The place to start looking for that cave is an understanding of how NodeName is
> used. The quick summary is that there are three cases:
> 
> (1) RIDPolicy class, MsgDestination=RIDSystem.
> 
> The NodeName is the destination of the RID message, so it can be required to work
> in the DNS, which already has to deal with figuring out whether a U-Label is valid.
> Text could be added to require that the NodeName either be used to send the RID
> message via DNS lookup or that it has been looked up at some point in the past and
> resolves to the IP address to which the RID message is sent.  Both of these
> approaches rely on the DNS implementation to validate the U-label.

I think that might be fine. I see that version -09 says:

         1.  RIDSystem.  The address listed in the Node element of the
             RIDPolicy class is the next upstream RID system that will
             receive the RID message.  If NodeName element of the Node
             class is used, it contains a DNS domain name.  The
             originating RID system is required to check that this
             domain name resolves to the IP address to which the RID
             message is sent.  This check may be performed in advance of
             sending the message and the result saved for future use
             with additional RID messages.

However, the spec right now says that the domain name MUST NOT contain
A-labels:

   The Node class identifies a host or network device.  This document
   re-uses the definition of Node from the IODEF specification
   [RFC5070], Section 3.16.  However, that document did not clearly
   specify whether a NodeName could be an Internationalized Domain Name
   (IDN).  RID systems MUST treat the NodeName class as a domain name
   slot [RFC5890].  RID systems SHOULD support IDNs in the NodeName
   class; if they do so, the UTF-8 representation of the domain name
   MUST be used, i.e., all of the domain name's labels MUST be U-labels
   expressed in UTF-8 or NR-LDH labels [RFC5890]; A-labels MUST NOT be
   used.  A RID system can convert between A-labels and U-labels by
   using the Punycode encoding [RFC3492] for A-labels as described in
   the protocol specification for Internationalized Domain Names in
   Applications [RFC5891].

If the application is not checking this, how will it know whether any
A-labels have snuck in? Will it depend on its DNS resolver to return the
IDN in the appropriate format? The foregoing text implies that the RID
system will do the conversion, so it's not clear to me where the
responsibility lies (and if it's not clear to me, it might be clear to
an implementer).
_________
KMM: We state that the document must be UTF-8 compliant("The use of UTF-8 is REQUIRED").  Does that cover the checking since A-labels are not UTF-8 formatted?  Would the question be resolved by changing:
" A RID system can convert between A-labels and U-labels by "
TO: " A an application communicating via RID can convert between A-labels and U-labels by "

This would put the burden within a application associated with the API that is IODEF/RID/RID Transport.
What we care about is consistent transport so that applications can share data.  As long as the data received is the same for every implementation, we shouldn't care how they use the domain names in their application, right?
__________
And if only U-labels are permitted, then comparison proceeds using the
U-labels as input. However, if so we might need to specify mappings of
the kind discussed in RFC 5895 (e.g., mapping uppercase and titlecase to
lowercase so that PRÉCIS.example.com is treated "the same" as
précis.example.com).

BUT...

Will RID systems in fact be doing such comparisons? The text from the
first paragraph quoted above indicates that RID systems will be checking
the association between a DNS domain name and an IP address, not
directly comparing DNS domain names (including IDNs). If that is true,
then we probably *don't* need to care much about comparison and any talk
of mappings is moot. Can you please verify that RID systems will be
checking the association with the IP address, rather than directly
comparing DNS domain names?
_______
KMM: I see your point here.  How about the following text to make it clear that the IP address is required and the DNS name is derived from the IP address:

         1.  RIDSystem.  The IP address of the next upstream system accepting RID communications is REQUIRED and is listed in the Node element of the
             RIDPolicy class.  If NodeName element of the Node
             class is used, it contains a DNS domain name.  The
             originating RID system is required to check that this
             domain name resolves to the IP address to which the RID
             message is sent.  This check may be performed in advance of
             sending the message and the result saved for future use
             with additional RID messages. 
______
> (2) RIDPolicy class, MsgDestination=SourceOfIncident, and also the IncidentSource
> class.
> 
> In both of these cases the IP address is the primary identifier (IncidentSource
> needs additional text to state that the IP address is required - that appears to
> have been overlooked).  There are a couple of options here:
> 
> a) The simplest is to prohibit NodeName usage.
> b) NodeName could be required to correspond to the IP address via DNS lookup.

In -09 that is:

         2.  SourceOfIncident.  The Address element of the Node element
             contains the IP address of the incident source, and the
             NodeName element of the Node class is not used.  The IP
             address is used to determine the path of systems accepting
             RID communications that will be used to find the closest
             RID system to the source of an attack in which the IP
             address used by the source is believed to be valid and a
             Request message with MsgDst set to InvestigationRequest is
             used.  This is not to be confused with the IncidentSource
             class, as the defined value here is from an initial trace
             or investigation Request, not the source used in a Result
             message.

I think it's fine to use IP address here.

NOTE: you said that IncidentSource needs additional text to state that
the IP address required, but I do not see that here:
_____
KMM: Thanks, if we are fixing text, let's just add that now.  This one should be easy.

Proposed:
         2.  SourceOfIncident.  The Address element of the Node element
             contains the IP address of the incident source, and the
             NodeName element of the Node class is not used.  The IP
             address is REQUIRED when this option is selected.  The IP address is used to determine the path of systems accepting
             RID communications that will be used to find the closest
             RID system to the source of an attack in which the IP
             address used by the source is believed to be valid and a
             Request message with MsgDst set to InvestigationRequest is
             used.  This is not to be confused with the IncidentSource
             class, as the defined value here is from an initial trace
             or investigation Request, not the source used in a Result
             message.
_____
http://tools.ietf.org/rfcdiff?url1=draft-ietf-mile-rfc6045-bis-08&difftype=--html&submit=Go!&url2=draft-ietf-mile-rfc6045-bis-09

However, I think that can be added before sending this document to the
RFC Editor, or during AUTH48.

> (3) RIDPolicy class, MsgDestination=ext-value.  This is an escape that allows
> extensions to MsgDestination.  The description of this should include a "there
> be dragons" warning about IDNs in NodeName.

In -09 that is:

         3.  ext-value.  An escape value used to extend this attribute.
             All extensions shall specify the contents and meaning of
             the Node element of RIDPolicy.  If the NodeName element of
             Node is used by an extension, NodeName may contain an
             Internationalized Domain Name (IDN) and that IDN is
             required to satisfy the requirements in [RFC5890].  It is
             strongly recommended that RID Systems satisfy those IDN
             requirements via appropriate use of the DNS as opposed to
             implementing their own checks for these requirements.  For
             example, an extension could use both a NodeName and an IP
             address to which the NodeName resolves.  See IODEF
             [RFC5070], Section 5.1 on extensibility.

That seems slightly underspecified: what is meant by "to satisfy the
requirements in [RFC5890]" and "appropriate use of the DNS"? Does that
mean "ensuring that the IDN retrieved from the DNS consists only of
U-labels and NR-LDH labels"? If so, is that done by asking the DNS
resolver for U-labels, or is it done by actively performing a check in
the RID system itself (see above about "MUST NOT be A-labels")?

Is this something that is left for each extension to define? If so, we
should come out and say precisely that.
_______
KMM: The important part for RID is that we communicate and exchange messages in a way that everyone can accept and process.  Application handling for this translation, whether it be by DNS or some other mechanism, really shouldn't matter.  As long as the internationalization is handled in a way that lets us consistently exchange information, I think the real issues are solved, no?
If I add "See Section 11 of this document for
      Internationalization considerations."  Does that resolve the issue since it provides consistent guidance that enables the exchange of information?
_______
> So, the answer to the somewhat rhetorical question:
> 
>> Do IODEF/RID implementations want to sign up for all that?
> 
> Is approximately "Yes, by relying on the DNS implementation," with the addition
> of a "there be dragons" warning to the MsgDestination ext-value extension text
> to communicate that to designers of extensions, and a possible prohibition of
> use of NodeName when the IP address is the primary identifier.

I think we're getting close to reflecting that in text, but a few
clarifications would help, as would validation from the WG that this
direction is agreeable.

I realize that these are thorny issues and that folks probably just want
someone to come from on high and say "yea verily, here is the one true
path to internationalization nirvana". Unfortunately, there is no such
path, so the WG needs to weigh the costs and benefits, clearly
understand what it's doing, and then make a decision one way or the
other. We've had too many WGs (some of which I've been involved with)
just take some i18n advice without knowledge, and the results have been
less then positive.

Thanks for your patience!

Peter
_________
Thank you for your detailed reviews!
Kathleen

-- 
Peter Saint-Andre
https://stpeter.im/

_______________________________________________
mile mailing list
mile@ietf.org
https://www.ietf.org/mailman/listinfo/mile