[Int-area] WPAD Evolution
joshco@gmail.com Fri, 27 October 2023 20:33 UTC
Return-Path: <joshco@gmail.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87C8CC14CE4D for <int-area@ietfa.amsl.com>; Fri, 27 Oct 2023 13:33:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.095
X-Spam-Level:
X-Spam-Status: No, score=-7.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FREEMAIL_DOC_PDF=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aAmfP53eprW8 for <int-area@ietfa.amsl.com>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 401A1C14CE38 for <int-area@ietf.org>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
Received: by mail-vk1-xa33.google.com with SMTP id 71dfb90a1353d-49d6bd3610cso1124466e0c.1 for <int-area@ietf.org>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698438793; x=1699043593; darn=ietf.org; h=content-language:thread-index:mime-version:message-id:date:subject :cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=WfQl91B981ACZl8doPg47DoWNipfbY5LOFCHbw0UTXA=; b=ELG0Sg/NSTobPDovrHaOIqTqTNBBTV4BQpkLproi2NmluPNJvAuzxpa3FnrMOg2x+H OmDozWHdaKEr52BXLnDJUd2GYqMicXE3Hl5JOGMq+v4+XB+KN3ypNRW+PYPKoKal/8hL BK3+7o7bAZsU7fNA1SMJziM5fZR1yUAUVZ88mpCmDyypKrpdxYR8xL060uAEX31b1PVY SiFfFDu5rdiuefHeJIgdwF/Axp6chhgOFEDUhrFiT9Ze39RRclffCevVOy2tSFTFSb8H wrdj4FiFWfYnZkKWexS2HZic19nQjnr/2hIuxHzd3+6gO/aTqN6cj6EEfNjzJMYKvLia NjJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698438793; x=1699043593; h=content-language:thread-index:mime-version:message-id:date:subject :cc:to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WfQl91B981ACZl8doPg47DoWNipfbY5LOFCHbw0UTXA=; b=vtVD6m/SHejDpt6gwRtk6UTq8UnTUe/dKMYIpqofYlFnjpsRDfJMjISCySRgR12KF1 YaN9kHqZy1SPXatEiiIAd4wnleZAFvFA+v+E8ye5dMMmpixxm72tqbsD/nFOd+Mp0LK9 AGOC6ibtf6TGKalfgynTshIngFoo3BHrLysujAC4+hMvbe6gOFHbxa79J0W1gDMGuQQ/ hKddX9liGQ4oq9KTaK8vKR4ENRN/31J7kTG9joDhvniYrcoNC+Z2R1/1TFu8CkpyHYea G/WdOL+2mjWETBcsCh5IcfUh9LeSuy9EjaMV8rjr4lGlp8Lj9Qve6J3Nqt4BNxY5Fmk/ npnQ==
X-Gm-Message-State: AOJu0Yzww+h9bK9mz6i4Ui8a7wH8WTEVmvLFCiwCKtc3v3OXUKhfPCbF i5B1oyDKlly//uwkd//n89s=
X-Google-Smtp-Source: AGHT+IE60kri4ve4Rxj2boEuNPP/lRQxRlzb8ZH7ckj1jp2Bl08V9wLZ87hjnxcqT3/plIrEACj6WQ==
X-Received: by 2002:a1f:a111:0:b0:49d:75d3:90b5 with SMTP id k17-20020a1fa111000000b0049d75d390b5mr4299024vke.6.1698438791900; Fri, 27 Oct 2023 13:33:11 -0700 (PDT)
Received: from yogal390 (2603-7000-4f3c-69a9-1010-02e0-c1b4-405e.res6.spectrum.com. [2603:7000:4f3c:69a9:1010:2e0:c1b4:405e]) by smtp.gmail.com with ESMTPSA id t11-20020a0cf98b000000b006655cc8f872sm916297qvn.99.2023.10.27.13.33.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Oct 2023 13:33:10 -0700 (PDT)
From: joshco@gmail.com
To: int-area@ietf.org
Cc: 'Mark Nottingham' <mnot@mnot.net>, 'Tommy Pauly' <tpauly@apple.com>, "'Marc C. Dacier'" <marc.dacier@kaust.edu.sa>, elyssa.boulila@amadeus.com, 'Dragana Damjanovic' <ddamjanovic@microsoft.com>
Date: Fri, 27 Oct 2023 16:33:08 -0400
Message-ID: <002d01da0914$cc58ab90$650a02b0$@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_002E_01DA08F3.45490760"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdoJFGDTxjDkmQouSBuS01W1SrWEpw==
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/IHzF0V7nPNlfgQjiU7utUo85ngU>
Subject: [Int-area] WPAD Evolution
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Oct 2023 20:33:19 -0000
[PDF version attached if the text formatting has problems] Hi Folks, For those wondering how the current WPAD solution came to be, I’m one of the original co-authors, and served as Program Manager for WinInet during Internet Explorer 5, where it was first implemented. The DNS Devolution discovery scheme was supposed to be a short-term solution until SVRLOC and DHCP became more widely deployed. 25 years later, it remains unfinished business. Recently I came across the ACM paper, humorously titled “Waiting Patiently for an Announced Disaster”[1], written by Marc Dacier and Elyssa Boulila, cc’d on this email, the latest in a series of security papers on WPAD vulnerabilities. I reached out to them and connected with the httpwg chairs. I learned that “What to about WPAD?” has been a perennial question, and Tommy Pauly had been noodling on it as well. Let’s see if we can make change before we get another spanking from the security police 😊 I’ve prepared some thoughts on its origins and suggestions for evolution. Origins I started thinking about this while working at UPS, in the early 1990s when we were setting up web browsing from corporate desktops. In UPS's case, that meant not just IT workers, but also the labor workforce, truck drivers, shipping center supervisors, etc in various UPS centers. We needed to avoid drowning the IT help desk with calls complaining "The Internet is down!", responding with "No, the Internet is not down. Your proxy settings just aren't configured correctly. Here's how you fix ...". We also needed proxy auth, which didn't exist in HTTP 1.0, but that's another story. Even for IT workers, when they were at work, they needed to use the corporate proxy, and then when they took their laptops home, they needed to go direct (or at least stop using the corporate proxy which was unavailable). They shouldn't have to manually flip back and forth every day. Or when they went to a meeting at a different company, a WG meeting hosted by another company, or an IETF meeting, they could automatically pick up relevant proxy settings for that network. WPAD was focused on the discovery mechanism to find the URL for the configuration file. Netscape Navigator 3.0 already had an option for the user to manually configure a URL for a PAC file. Configuration file formats were JavaScript PAC, and IE configuration files, which were essentially Windows INI files. For those who grimace at JavaScript, it is important to consider that things could have been worse 😊 Initial Proposals I 1997, I was a developer on the Netscape Proxy Server team. I began drafting a proposal. At the same time, Stuart Kwan @ Microsoft was doing the same based on DHCP. We coordinated, and I drafted an ID which included the DHCP option as well as SRVLOC, which were the preferred discovery choices, and sent it to the httpwg list in March 1997. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0686.html Reading the thread itself also provides insights into the thinking of the other httpwg members. You'll note in the original thread, the discovery proposals were based on SVRLOC and DHCP. Stuart Kwan’s DHCP ID is: https://datatracker.ietf.org/doc/html/draft-kwan-proxy-client-conf-00 Others weighed in on scenarios: Joel N Weber @ MIT stated: When people at my school screwed the routing so that the machines behind the firewall could talk to the mail server, but not outside machines, I changed about three teacher's machines manually. Those teachers didn't have a clue what a proxy is; and it would be no easier for them to tell the browser to use automagic* setup than it is for them to say to use 204.130.130.62 port 80. [snip] I personally would rather see DHCP used instead of DNS I think. But I need to read the DHCP spec before I comment further... https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0698.html * Refers to pre-WPAD manual PAC configuration in Navigator 3.0 Stuart Kwan @ MSFT stated: …I unplug my laptop and take it on a visit to Netscape. When I plug into the Netscape corporate network… https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0713.html SVRLOC My initial preference was to use SVRLOC. However, at the time SVRLOC was new, the WG was still figuring things out, and it was pre-adoption. Furthermore, the teams that would build implementations were far away from deciding what shape their application facing APIs would take. Josh Cohen stated: Another fundamental issue here is if you buy into the service location rotocol ideas. I do. They are trying to solve the problem of "how do I use a standard method to look up arbitrary services on a network". Granted, by following their DNS recommendations ( which is only one of many ways to advertise services in their world ), its not perfect. In time, I hope that we could support their multicast discovery protocol as well. I suppose I could have specified that in the draft, but since the rest of the serverloc stuff is so new, and virtually undeployed as of yet, I figured this gives us a solution to a big problem, today, with protocols and APIs software implementors in the realm of the 'world wide web' are commonly using. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0708.html . DHCP DHCP was well-specified, and deployment was increasing, however APIs for applications to ask for random DHCP options were not yet available. Similarly, emerging DHCP server products didn’t always allow an IT admin to configure arbitrary options easily or at all. Vinod Valloppillil @ MSFT stated: On windows, for ex., you can look at DHCP options quite easily in the registyr but there aren't API's (yet) for querying RR's from the DNS (unless you want to count on NSLOOKUP output on the commandline) https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0712.html Stuart Kwan @ MSFT stated: 3) It is true that DHCP is not necessarily widely available today, but if anything SRVLOC is less available. The DHCP method at least gives you something you can use now. 4) The fact that there is no cross-platform API to retrieve DHCP options is interesting, but does not block implementation. While the DHCP WG investigates this problem, use the platform-specific method for retrieving options. Please note that there is no cross-platform standard API for retrieving TXT RRs from DNS. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0713.html Stuart Kwan @ MSFT stated: I am also not opposed to storing this information in two places. I am only concerned that we solve the automatic configuration problem. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0715.html DNS Due to the above constraints, DNS was another discovery option. TXT records were considered a cleaner approach than A records, but they were new and not widely deployed. Joel N Weber @ MIT stated: It should be noted that albert.gnu.ai.mit.edu (the mail server that processes all my incoming mail) had a version of named which couldn't handle TXT records up until about a month ago. So even that solution is not really completely compatible with existing sites. However, if you want to go the name server route, I wonder why you couldn't make the URL something hardcoded like http://www-ns-pac/proxy.ins That would mean that you have to use port 80 with a standardized path, but that's less strange than the TXT or SRVRLOC records. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0714.html A hardcoded domain name A record and path was the only solution that would always be available, and became the “fallback” case. CARP/ICP At the time, we were also unsure of how proxy deployments would evolve for caching efficiency as many people were using dialup, and IP capable smartphones were still in the future. Contemporary efforts were Internet Cache Protocol (ICP)[2], and Cache Array Routing Protocol (CARP)[3]. Josh Cohen stated: I want to make the point that as caches as being deployed more frequently and with more complexity, ie hierarchies and dynamic ICP type protocols, the configurations need to be dynamic and complex. Well beyond what a nontechnical user should need to know, and extremely difficult to provide a UI to let a user specify it manually. https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0705.html CARP is what we call sharding today. An algorithm would compute a hash of the URL and treat the cache array as a hash table. Requests would be forwarded to the cache for that hash bucket. ICP is a UDP based protocol between caches that resembles how hierarchical DNS caching works. Most of the functionality was inter-cache via UDP, but a client function could determine the best first-hop in the hierarchy. We thought these scenarios were important enough to support, and the generalities of JavaScript solved the browser’s side of the problem. One could implement CARP hashing, or ICP bootstrap in JavaScript. Implementation in IE5 Internet Explorer 5 released Beta 1 in June 1998, Beta 2 in November 1998, and final in March 1999. At the time the workable discovery solutions were DHCP and DNS A. The implementation prioritized DHCP and fell-back to DNS A. DNS Devolution Somewhere along this path, we came up with the DNS Devolution scheme. We were wise enough to know that new TLDs would come, someday, but naïve in thinking it would be a rare occurrence. It’s not like we’re ever going to see http://ietf.rocks. Oops. UI We wanted to include UI that would provide clarity to the user. For example, a pop up or interstitial screen that informed the user of the proxy discovered and potentially ask the user to accept or deny. However, making the signaling work between WinInet (thread pool) and Trident (HTML renderer) would have required architectural changes that schedule did not permit. Looking Forward The world has changed a lot in 25 years. WPAD functioned as a framework which separated discovery schemes and configuration file formats. Scenarios IT Admins, who need to deal with disparate kinds of networks, will choose the combination of discovery and config format they use. There will need to be a transition period for any migration. I think it makes sense to continue the framework model. I suggest we be deliberate in soliciting current practice scenarios of IT Admins with respect to proxy servers. Apple, Microsoft, Google et al could also have their enterprise account managers as their enterprise customers to complete a short survey. SVRLOC SVRLOC is now widely adopted and in use for similar service discovery scenarios. I suggest that SRVLOC discovery mechanism should be fleshed out and made to align with current practices for using SVRLOC. One potential path suggested by an author of the security paper[4], Marc Dacier, is the use of DNS SD, possibly including mDNS, and DNSSEC for security. PVD Proxy Config A new draft of Communicating Proxy Configurations in Provisioning Domains[5] has been published by Tommy Pauly/Apple and Dragana Damjanovic/Microsoft. The first good thing is that Apple and Microsoft are teaming up. I would suggest adding co-authors from other potential implementers, whether browsers like Chrome, Mozilla et al, or OS level services like for Linux, Android, and Arduino/ESP32 IoT. This draft provides a JSON format configuration file, which provides a modern solution that is consistent with common practices. The JSON content re-uses PVDDATA[6], which I’ll admit I have little experience with. Aside from observing that it can provide the relevant config information for simpler scenarios, I look to others on the question of if PVDDATA is an emerging common practice. The draft introduces a new discovery mechanism, Router Advertisements. The draft re-uses Well known URIs, something that didn’t exist in 1997. DNS TXT Records & Sunset DNS Devolution I suggest we flesh out the DNS TXT discovery scheme pick a target date for deprecation of the DNS devolution scheme. Continue the Framework I suggest that the new solution continue the framework approach leaving multiple choice of discovery mechanisms and config formats. Browsers that can accept PVD can signal that with Accept headers and fall back to existing PAC if needed. User Experience In 2023, we can do a better job with the user experience than we were able to do in 1997. The HTTP specification has numerous MUSTs proscribing user agent behavior. If we can come to agreement on the UX, that should be specified. For example, let’s say a MUST that browsers to provide a notification, and choice for user to reject (via an interstitial like SSL errors, choice to continue unsafely, a modal or other method) Venue To focus this work, and attract others to provide their input, the WPAD evolution work item needs a home. When discussing this with the httpwg chairs, Mark and Tommy, potential venues for this evolution could be an HTTP work.shop, or an intarea work stream, or both. [1] https://dl.acm.org/doi/10.1145/3565361 “Waiting Patiently for an Announced Disaster”, Dacier, Bouilila [2] https://en.wikipedia.org/wiki/Internet_Cache_Protocol [3] https://en.wikipedia.org/wiki/Cache_Array_Routing_Protocol [4] https://dl.acm.org/doi/10.1145/3565361 “Waiting Patiently for an Announced Disaster”, Dacier, Bouilila [5] https://www.ietf.org/archive/id/draft-pauly-intarea-proxy-config-pvd-01.txt [6] https://www.rfc-editor.org/rfc/rfc8801
- [Int-area] WPAD Evolution joshco