[Int-area] WPAD Evolution

joshco@gmail.com Fri, 27 October 2023 20:33 UTC

Return-Path: <joshco@gmail.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87C8CC14CE4D for <int-area@ietfa.amsl.com>; Fri, 27 Oct 2023 13:33:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.095
X-Spam-Level:
X-Spam-Status: No, score=-7.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FREEMAIL_DOC_PDF=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aAmfP53eprW8 for <int-area@ietfa.amsl.com>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 401A1C14CE38 for <int-area@ietf.org>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
Received: by mail-vk1-xa33.google.com with SMTP id 71dfb90a1353d-49d6bd3610cso1124466e0c.1 for <int-area@ietf.org>; Fri, 27 Oct 2023 13:33:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698438793; x=1699043593; darn=ietf.org; h=content-language:thread-index:mime-version:message-id:date:subject :cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=WfQl91B981ACZl8doPg47DoWNipfbY5LOFCHbw0UTXA=; b=ELG0Sg/NSTobPDovrHaOIqTqTNBBTV4BQpkLproi2NmluPNJvAuzxpa3FnrMOg2x+H OmDozWHdaKEr52BXLnDJUd2GYqMicXE3Hl5JOGMq+v4+XB+KN3ypNRW+PYPKoKal/8hL BK3+7o7bAZsU7fNA1SMJziM5fZR1yUAUVZ88mpCmDyypKrpdxYR8xL060uAEX31b1PVY SiFfFDu5rdiuefHeJIgdwF/Axp6chhgOFEDUhrFiT9Ze39RRclffCevVOy2tSFTFSb8H wrdj4FiFWfYnZkKWexS2HZic19nQjnr/2hIuxHzd3+6gO/aTqN6cj6EEfNjzJMYKvLia NjJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698438793; x=1699043593; h=content-language:thread-index:mime-version:message-id:date:subject :cc:to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WfQl91B981ACZl8doPg47DoWNipfbY5LOFCHbw0UTXA=; b=vtVD6m/SHejDpt6gwRtk6UTq8UnTUe/dKMYIpqofYlFnjpsRDfJMjISCySRgR12KF1 YaN9kHqZy1SPXatEiiIAd4wnleZAFvFA+v+E8ye5dMMmpixxm72tqbsD/nFOd+Mp0LK9 AGOC6ibtf6TGKalfgynTshIngFoo3BHrLysujAC4+hMvbe6gOFHbxa79J0W1gDMGuQQ/ hKddX9liGQ4oq9KTaK8vKR4ENRN/31J7kTG9joDhvniYrcoNC+Z2R1/1TFu8CkpyHYea G/WdOL+2mjWETBcsCh5IcfUh9LeSuy9EjaMV8rjr4lGlp8Lj9Qve6J3Nqt4BNxY5Fmk/ npnQ==
X-Gm-Message-State: AOJu0Yzww+h9bK9mz6i4Ui8a7wH8WTEVmvLFCiwCKtc3v3OXUKhfPCbF i5B1oyDKlly//uwkd//n89s=
X-Google-Smtp-Source: AGHT+IE60kri4ve4Rxj2boEuNPP/lRQxRlzb8ZH7ckj1jp2Bl08V9wLZ87hjnxcqT3/plIrEACj6WQ==
X-Received: by 2002:a1f:a111:0:b0:49d:75d3:90b5 with SMTP id k17-20020a1fa111000000b0049d75d390b5mr4299024vke.6.1698438791900; Fri, 27 Oct 2023 13:33:11 -0700 (PDT)
Received: from yogal390 (2603-7000-4f3c-69a9-1010-02e0-c1b4-405e.res6.spectrum.com. [2603:7000:4f3c:69a9:1010:2e0:c1b4:405e]) by smtp.gmail.com with ESMTPSA id t11-20020a0cf98b000000b006655cc8f872sm916297qvn.99.2023.10.27.13.33.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Oct 2023 13:33:10 -0700 (PDT)
From: joshco@gmail.com
To: int-area@ietf.org
Cc: 'Mark Nottingham' <mnot@mnot.net>, 'Tommy Pauly' <tpauly@apple.com>, "'Marc C. Dacier'" <marc.dacier@kaust.edu.sa>, elyssa.boulila@amadeus.com, 'Dragana Damjanovic' <ddamjanovic@microsoft.com>
Date: Fri, 27 Oct 2023 16:33:08 -0400
Message-ID: <002d01da0914$cc58ab90$650a02b0$@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_002E_01DA08F3.45490760"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdoJFGDTxjDkmQouSBuS01W1SrWEpw==
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/IHzF0V7nPNlfgQjiU7utUo85ngU>
Subject: [Int-area] WPAD Evolution
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Internet Area WG Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Oct 2023 20:33:19 -0000

[PDF version attached if the text formatting has problems]

Hi Folks,

For those wondering how the current WPAD solution came to be, I’m one of
the original co-authors, and served as Program Manager for WinInet
during Internet Explorer 5, where it was first implemented. The DNS
Devolution discovery scheme was supposed to be a short-term solution
until SVRLOC and DHCP became more widely deployed. 25 years later, it
remains unfinished business.

Recently I came across the ACM paper, humorously titled “Waiting
Patiently for an Announced Disaster”[1], written by Marc Dacier and
Elyssa Boulila, cc’d on this email, the latest in a series of security
papers on WPAD vulnerabilities. I reached out to them and connected with
the httpwg chairs. I learned that “What to about WPAD?” has been a
perennial question, and Tommy Pauly had been noodling on it as well.

Let’s see if we can make change before we get another spanking from the
security police 😊

I’ve prepared some thoughts on its origins and suggestions for
evolution.

Origins

I started thinking about this while working at UPS, in the early 1990s
when we were setting up web browsing from corporate desktops.

In UPS's case, that meant not just IT workers, but also the labor
workforce, truck drivers, shipping center supervisors, etc in various
UPS centers. We needed to avoid drowning the IT help desk with calls
complaining "The Internet is down!", responding with "No, the Internet
is not down. Your proxy settings just aren't configured correctly.
Here's how you fix ...". We also needed proxy auth, which didn't exist
in HTTP 1.0, but that's another story.

Even for IT workers, when they were at work, they needed to use the
corporate proxy, and then when they took their laptops home, they needed
to go direct (or at least stop using the corporate proxy which was
unavailable). They shouldn't have to manually flip back and forth every
day. Or when they went to a meeting at a different company, a WG meeting
hosted by another company, or an IETF meeting, they could automatically
pick up relevant proxy settings for that network.

WPAD was focused on the discovery mechanism to find the URL for the
configuration file. Netscape Navigator 3.0 already had an option for the
user to manually configure a URL for a PAC file.

Configuration file formats were JavaScript PAC, and IE configuration
files, which were essentially Windows INI files. For those who grimace
at JavaScript, it is important to consider that things could have been
worse 😊

Initial Proposals

I 1997, I was a developer on the Netscape Proxy Server team. I began
drafting a proposal. At the same time, Stuart Kwan @ Microsoft was doing
the same based on DHCP.

We coordinated, and I drafted an ID which included the DHCP option as
well as SRVLOC, which were the preferred discovery choices, and sent it
to the httpwg list in March 1997.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0686.html

Reading the thread itself also provides insights into the thinking of
the other httpwg members. You'll note in the original thread, the
discovery proposals were based on SVRLOC and DHCP.

Stuart Kwan’s DHCP ID is:

https://datatracker.ietf.org/doc/html/draft-kwan-proxy-client-conf-00

Others weighed in on scenarios:

Joel N Weber @ MIT stated:

  When people at my school screwed the routing so that the machines
  behind the firewall could talk to the mail server, but not outside
  machines, I changed about three teacher's machines manually. Those
  teachers didn't have a clue what a proxy is; and it would be no easier
  for them to tell the browser to use automagic* setup than it is for
  them to say to use 204.130.130.62 port 80.

  [snip]

  I personally would rather see DHCP used instead of DNS I think.

  But I need to read the DHCP spec before I comment further...

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0698.html

* Refers to pre-WPAD manual PAC configuration in Navigator 3.0

Stuart Kwan @ MSFT stated:

  …I unplug my laptop and take it on a visit to Netscape. When I plug
  into the Netscape corporate network…

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0713.html

SVRLOC

My initial preference was to use SVRLOC. However, at the time SVRLOC was
new, the WG was still figuring things out, and it was pre-adoption.
Furthermore, the teams that would build implementations were far away
from deciding what shape their application facing APIs would take.

Josh Cohen stated:

  Another fundamental issue here is if you buy into the service location
  rotocol ideas. I do. They are trying to solve the problem of "how do I
  use a standard method to look up arbitrary services on a network".
  Granted, by following their DNS recommendations ( which is only one of
  many ways to advertise services in their world ), its not perfect. In
  time, I hope that we could support their multicast discovery protocol
  as well.

  I suppose I could have specified that in the draft, but since the rest
  of the serverloc stuff is so new, and virtually undeployed as of yet,
  I figured this gives us a solution to a big problem, today, with
  protocols and APIs software implementors in the realm of the 'world
  wide web' are commonly using.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0708.html

.

DHCP

DHCP was well-specified, and deployment was increasing, however APIs for
applications to ask for random DHCP options were not yet available.
Similarly, emerging DHCP server products didn’t always allow an IT admin
to configure arbitrary options easily or at all.

Vinod Valloppillil @ MSFT stated:

  On windows, for ex., you can look at DHCP options quite easily in the
  registyr but there aren't API's (yet) for querying RR's from the DNS
  (unless you want to count on NSLOOKUP output on the commandline)

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0712.html

Stuart Kwan @ MSFT stated:

  3) It is true that DHCP is not necessarily widely available today, but
  if anything SRVLOC is less available. The DHCP method at least gives
  you something you can use now.

  4) The fact that there is no cross-platform API to retrieve DHCP
  options is interesting, but does not block implementation. While the
  DHCP WG investigates this problem, use the platform-specific method
  for retrieving options. Please note that there is no cross-platform
  standard API for retrieving TXT RRs from DNS.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0713.html

Stuart Kwan @ MSFT stated:

  I am also not opposed to storing this information in two places. I am
  only concerned that we solve the automatic configuration problem.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0715.html

DNS

Due to the above constraints, DNS was another discovery option. TXT
records were considered a cleaner approach than A records, but they were
new and not widely deployed.

Joel N Weber @ MIT stated:

  It should be noted that albert.gnu.ai.mit.edu (the mail server that
  processes all my incoming mail) had a version of named which couldn't
  handle TXT records up until about a month ago. So even that solution
  is not really completely compatible with existing sites.

  However, if you want to go the name server route, I wonder why you
  couldn't make the URL something hardcoded like
  http://www-ns-pac/proxy.ins That would mean that you have to use port
  80 with a standardized path, but that's less strange than the TXT or
  SRVRLOC records.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0714.html

A hardcoded domain name A record and path was the only solution that
would always be available, and became the “fallback” case.

CARP/ICP

At the time, we were also unsure of how proxy deployments would evolve
for caching efficiency as many people were using dialup, and IP capable
smartphones were still in the future. Contemporary efforts were Internet
Cache Protocol (ICP)[2], and Cache Array Routing Protocol (CARP)[3].

Josh Cohen stated:

  I want to make the point that as caches as being deployed more
  frequently and with more complexity, ie hierarchies and dynamic ICP
  type protocols, the configurations need to be dynamic and complex.
  Well beyond what a nontechnical user should need to know, and
  extremely difficult to provide a UI to let a user specify it manually.

https://lists.w3.org/Archives/Public/ietf-http-wg/1997JanMar/0705.html

CARP is what we call sharding today. An algorithm would compute a hash
of the URL and treat the cache array as a hash table. Requests would be
forwarded to the cache for that hash bucket.

ICP is a UDP based protocol between caches that resembles how
hierarchical DNS caching works. Most of the functionality was
inter-cache via UDP, but a client function could determine the best
first-hop in the hierarchy.

We thought these scenarios were important enough to support, and the
generalities of JavaScript solved the browser’s side of the problem. One
could implement CARP hashing, or ICP bootstrap in JavaScript.

Implementation in IE5

Internet Explorer 5 released Beta 1 in June 1998, Beta 2 in November
1998, and final in March 1999.

At the time the workable discovery solutions were DHCP and DNS A. The
implementation prioritized DHCP and fell-back to DNS A.

DNS Devolution

Somewhere along this path, we came up with the DNS Devolution scheme. We
were wise enough to know that new TLDs would come, someday, but naïve in
thinking it would be a rare occurrence. It’s not like we’re ever going
to see http://ietf.rocks. Oops.

UI

We wanted to include UI that would provide clarity to the user. For
example, a pop up or interstitial screen that informed the user of the
proxy discovered and potentially ask the user to accept or deny.
However, making the signaling work between WinInet (thread pool) and
Trident (HTML renderer) would have required architectural changes that
schedule did not permit.

Looking Forward

The world has changed a lot in 25 years. WPAD functioned as a framework
which separated discovery schemes and configuration file formats.

Scenarios

IT Admins, who need to deal with disparate kinds of networks, will
choose the combination of discovery and config format they use. There
will need to be a transition period for any migration. I think it makes
sense to continue the framework model.

I suggest we be deliberate in soliciting current practice scenarios of
IT Admins with respect to proxy servers.

Apple, Microsoft, Google et al could also have their enterprise account
managers as their enterprise customers to complete a short survey.

SVRLOC

SVRLOC is now widely adopted and in use for similar service discovery
scenarios. I suggest that SRVLOC discovery mechanism should be fleshed
out and made to align with current practices for using SVRLOC.

One potential path suggested by an author of the security paper[4], Marc
Dacier, is the use of DNS SD, possibly including mDNS, and DNSSEC for
security.

PVD Proxy Config

A new draft of Communicating Proxy Configurations in Provisioning
Domains[5] has been published by Tommy Pauly/Apple and Dragana
Damjanovic/Microsoft.

The first good thing is that Apple and Microsoft are teaming up. I would
suggest adding co-authors from other potential implementers, whether
browsers like Chrome, Mozilla et al, or OS level services like for
Linux, Android, and Arduino/ESP32 IoT.

This draft provides a JSON format configuration file, which provides a
modern solution that is consistent with common practices.

The JSON content re-uses PVDDATA[6], which I’ll admit I have little
experience with. Aside from observing that it can provide the relevant
config information for simpler scenarios, I look to others on the
question of if PVDDATA is an emerging common practice.

The draft introduces a new discovery mechanism, Router Advertisements.

The draft re-uses Well known URIs, something that didn’t exist in 1997.

DNS TXT Records & Sunset DNS Devolution

I suggest we flesh out the DNS TXT discovery scheme pick a target date
for deprecation of the DNS devolution scheme.

Continue the Framework

I suggest that the new solution continue the framework approach leaving
multiple choice of discovery mechanisms and config formats. Browsers
that can accept PVD can signal that with Accept headers and fall back to
existing PAC if needed.

User Experience

In 2023, we can do a better job with the user experience than we were
able to do in 1997. The HTTP specification has numerous MUSTs
proscribing user agent behavior. If we can come to agreement on the UX,
that should be specified. For example, let’s say a MUST that browsers to
provide a notification, and choice for user to reject (via an
interstitial like SSL errors, choice to continue unsafely, a modal or
other method)

Venue

To focus this work, and attract others to provide their input, the WPAD
evolution work item needs a home.

When discussing this with the httpwg chairs, Mark and Tommy, potential
venues for this evolution could be an HTTP work.shop, or an intarea work
stream, or both.

[1] https://dl.acm.org/doi/10.1145/3565361 “Waiting Patiently for an
Announced Disaster”, Dacier, Bouilila

[2] https://en.wikipedia.org/wiki/Internet_Cache_Protocol

[3] https://en.wikipedia.org/wiki/Cache_Array_Routing_Protocol

[4] https://dl.acm.org/doi/10.1145/3565361 “Waiting Patiently for an
Announced Disaster”, Dacier, Bouilila

[5] https://www.ietf.org/archive/id/draft-pauly-intarea-proxy-config-pvd-01.txt

[6] https://www.rfc-editor.org/rfc/rfc8801