PP15: Does Applicability Matter for Applications?

Lisa Dusseault <lisa@osafoundation.org> Sun, 27 January 2008 20:21 UTC

Mime-Version: 1.0 (Apple Message framework v752.3)
To: Apps Discuss <discuss@apps.ietf.org>
Message-Id: <BB5ABA7B-23FE-4AB7-9AC9-41CB1334C292@osafoundation.org>
Content-Type: multipart/alternative; boundary="Apple-Mail-13--180238426"
References: <31D151A3D66E404AACBBB0247ACA54A7029D3A@STNTEXCH11.cis.neustar.com>
From: Lisa Dusseault <lisa@osafoundation.org>
Subject: PP15: Does Applicability Matter for Applications?
Date: Sun, 27 Jan 2008 12:21:00 -0800
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org


Begin forwarded message:

> From: "Peterson, Jon" <jon.peterson@neustar.biz>
> Does Applicability Matter for Applications?
> Jon Peterson
>
>
> There is a longstanding maxim of Internet protocol design that  
> successful application protocols invariably devolve into generic  
> transports. The canonical example would be HTTP; the parody in  
> RFC3093 (the "Firewall Enhancement Protocol") is frighteningly  
> close to the truth. A number of protocols which have little to do  
> with the delivery of hypertext take advantage of the ubiquitous  
> deployment of HTTP, piggybacking on it or masquerading as it on the  
> wire.
>
> It is widely believed that this slide into the generic is  
> undesirable. Most application protocols have a specific intended  
> sphere of applicability, and this applicability informs the design.  
> HTTP was certainly not intended to be a transport for arbitrary data.
>
> So why does this occur?
>
> One reason is because there is a certain barrier to entry for new  
> protocols on the Internet. Once a protocol enjoys widespread  
> implementation and deployment, the Internet has adapted to its  
> presence: endpoints support it, middleboxes allow it, servers are  
> optimized for it. It becomes difficult to sell devices that inhibit  
> a dominant protocol. When a new application goes shopping for a  
> protocol, the level of effort required to overcome this barrier to  
> entry is a significant consideration, and riding on the coat-tails  
> of an existing dominant protocol appears an attractive path to  
> rapid deployment.
>
> This paper argues that furthermore, a class of tools exist that can  
> render application protocols generic, a class that will be referred  
> to here as "genericizers". In the case of HTTP, SOAP would be a  
> prime example of this sort of tool. The effect of a genericizer is  
> to broaden the applicability of a protocol by enabling its fields  
> or payloads to carry unanticipated material with different,  
> potentially radically different, characteristics than the protocol  
> designers intended. From a standardization perspective,  
> genericizers furthermore perform this function without requiring  
> any modification to the underlying protocol. These tools thus allow  
> implementers, and designers of derivative specifications, to  
> reinterpret the applicability of a protocol entirely.
>
> As a case study of the phenomenon of genericizers, this paper  
> explores the data URL (RFC2397) and considers the manner in which  
> proposed uses of the data URL impact and genericize protocols in  
> the RAI Area, particularly ENUM and SIP.
>
> Broadly, the purpose of the data URL is to provide literal data by- 
> value when the use of a URI is required, but a reference is  
> undesirable. RFC2397 suggests that the intended use was to provide  
> relatively small chunks of inline data, ranging from text strings  
> (the default MIME type for data) to encoded binary representation  
> of modestly-sized image files. The intended applicability of the  
> data URL is more or less unlimited; RFC2397 says little more than  
> "Some applications that use URLs also have a need to embed (small)  
> media type data directly inline", and that "The "data:" URL scheme  
> is only useful for short values."
>
> It is arguable whether or not data URL meets the definition of a  
> URI - not because it fails to yield a resource, given the term  
> 'resource' is defined quite liberally in RFC2396, but because there  
> is no meaningful sense in which it serves as an identifier. As  
> RFC2396 says, "An identifier is an object that can act as a  
> reference to something that has identity." The data URL is not a  
> reference, it is a literal. Similarly, a URL is defined as "the  
> subset of URI that identify resources via a representation of their  
> primary access mechanism", and a data URL clearly does not  
> constitute a representation nor does it reflect any sort of access.  
> Already these deviations from the conventional purpose of URIs and  
> URLs suggest that that use of the data scheme might have unintended  
> consequences.
>
> Consider the use of the data URI in ENUM (RFC3761). ENUM is a  
> mechanism for using the DNS to discover URIs associated with  
> telephone numbers. For this purpose ENUM builds off the DDDS  
> framework, albeit ENUM benefits from only a subset of DDDS's  
> capabilities (for instance, it has no pressing need for the order  
> v. preference distinction, the replacement field, or a non-greed  
> LHS of the regular expression, not to mention that lookups targets  
> a pre-established "golden root" domain).
>
> The data URL entered the ENUM community through draft-ietf-enum- 
> cnam, a document which proposes a way to look up a text string  
> associated with a telephone number (the text string contains the  
> name one would see when Caller ID is displayed on a telephone).  
> This string is stored as a data URL within a NAPTR record. On some  
> level, the motivation for this work is obvious. ENUM as a query- 
> response protocol is implemented on the target devices for this  
> application, the devices need some additional query-response  
> functionality; ENUM is in other words perceived as a dominant  
> protocol, and enum-cnam  proposes to piggyback additional data onto  
> it.
>
> But once a genericizer has been introduced, ENUM no longer  
> neccesarily shows "how DNS can be used for identifying available  
> services connected to one E.164 number", since the data URL in no  
> way identifies services. Instead, it renders ENUM a generic  
> database protocol whose keys are telephone numbers and values are  
> arbitrary data. Once the capability to parse data URLs is present  
> in ENUM resolvers, arbitrary data then can be served via the DNS.  
> Effectively this allows domain administrators to embed TXT RRs  
> within NAPTR RRs, with the added bonus of MIME typing to allow  
> various binary data types. Proposals that have been informally  
> discussed with this regard include embedding public keys, ringtones  
> and vCards within data URLs in NAPTR RRs. It would not be much of a  
> stretch to suggest that HTML documents could be served directly  
> from the DNS in a similar fashion. All of these proposals have  
> familiar implications on DNS response message size, caching,  
> security, privacy, and so on. None of these problems arise in the  
> use of typical URIs, which identify resources in the network and  
> thus provide a layer of indirection. ENUM assumes the existence of  
> that layer of indirection; without it, ENUM's architectural  
> underpinnings look increasingly suspect.
>
> The current version of draft-ietf-enum-cnam has abandoned the data  
> URL, due to pushback - it does however propose a new 'pstndata' URL  
> scheme, with more or less identical properties but a slightly  
> constrained applicability to PSTN-related data. However, despite  
> this setback the data URL still enjoys a vogue in ENUM circles. One  
> current notable draft is draft-ietf-enum-unused-02, which proposes  
> the use of a text string within a data URL to indicate that a  
> particular telephone number is not in service in the PSTN.
>
> The data URL has also begun to made a few appearances in the SIP  
> WG, as a manner of transporting large chunks of data in headers.  
> The SIP (RFC3261) architecture distinguishes envelope from body in  
> a manner similar to email; intermediaries inspect and modify  
> headers in the process of routing requests, whereas message bodies  
> are payloads that are delivered to applications at the endpoints.  
> While intermediaries are not strictly forbidden from inspecting SIP  
> message bodies, there is no standard routing procedure that relies  
> on them doing so, and intermediaries are forbidden explicitly from  
> modifying SIP message bodies. However, there are members of the SIP  
> community who would like intermediaries to have control over the  
> bodies of SIP messages, mostly so that SDP can be modified to  
> enforce policies familiar to operators in the PSTN. So, as is the  
> case with email, there has for some time been an impetus in the SIP  
> WG from this contingent end the tyranny of endpoints over bodies,  
> but here has not been a standard way to permit unilateral  
> modification of bodies by intermediaries.
>
> However, given that SIP message bodies could conceivably be encoded  
> as data URLs, and numerous SIP header fields permit arbitrary URIs  
> as their value, there is apparently an easy workaround. Ordinarily,  
> location information such as PIDF-LO (RFC4119) is carried as a body  
> within SIP; for the Location header field proposed by draft-ietf- 
> sip-location-conveyance, it has been proposed that a data URI be  
> used when an intermediary needs to insert location information.  
> Similar suggestions have been made related to some bodies used for  
> security properties, and for SDP itself.
>
> Leaving aside practical concerns about the length of SIP headers  
> that parsers can withstand, the effect of moving information from  
> bodies to headers fundamentally changes the SIP architecture. For  
> example, SIP's security model is focused on providing end-to-end  
> security services for bodies, but not for headers. Moreover,  
> applications at the endpoints that want to consume bodies should  
> have some reasonable sense of who created them. That is clear in  
> the RFC3261 architecture, but much less so if bodies migrate into  
> headers. The acceptance of more or less any proposal to encode data  
> URLs in SIP headers would open the doors to numerous
>
> So what's to be done?
>
> This study sheds light on genericizers in order to make them easier  
> to identify when they arise in future standardization efforts, and  
> also to illustrate that they have significant architectural  
> implications. Given how long a protocol like RFC2397 has been  
> around, it is unlikely that it will be deprecated; nor is it  
> reasonable for protocols that use URIs to single out that the data  
> URI is inappropriate for use in a particular field - as the enum- 
> cnam "pstndata" URI proposal illustrates, it is quite easy to  
> circumvent such a prohibition. However, the lessons learned from  
> studying the manner in which genericizers are leveraged may assist  
> in preventing similar architectural loopholes in the future.
>
> Unfortunately, the primary reason why the data URL is not more  
> widely used in the IETF is, in all likelihood, that relatively few  
> participants are aware of it.
>
> Finally, it is important to note that the data URL did not create  
> the impetus in ENUM or SIP to genericize their architectures; the  
> data URL is merely an enabler used by the advocates of those  
> positions to advance more generic architectures without having to  
> contest the underlying design choices of ENUM and SIP.
>

PP15: Does Applicability Matter for Applications? Lisa Dusseault