More proxy documentation.

* Add a table of contents * Use "proxy server identifier" throughout * Rework the proxy server identifier section * Use URI formatted proxy server identifiers throughout document * Add some links between sections Change-Id: Ie61212697cf70a2532d971113dbd550639ae9f7d Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1636259 Commit-Queue: Eric Roman <eroman@chromium.org> Reviewed-by: Matt Menke <mmenke@chromium.org> Cr-Commit-Position: refs/heads/master@{#665608}

More proxy documentation.
* Add a table of contents * Use "proxy server identifier" throughout * Rework the proxy server identifier section * Use URI formatted proxy server identifiers throughout document * Add some links between sections Change-Id: Ie61212697cf70a2532d971113dbd550639ae9f7d Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1636259 Commit-Queue: Eric Roman <eroman@chromium.org> Reviewed-by: Matt Menke <mmenke@chromium.org> Cr-Commit-Position: refs/heads/master@{#665608}
12667034 · Eric Roman · Commit Bot · 19cd3dc1 · 12667034
Commit 12667034 authored Jun 03, 2019 by Eric Roman Committed by Commit Bot Jun 03, 2019
Show whitespace changes
Inline Side-by-side

Showing with 145 additions and 119 deletions

net/docs/proxy.md net/docs/proxy.md +145 -119

No files found.
--- a/net/docs/proxy.md
+++ b/net/docs/proxy.md
 # Proxy support in Chrome

-This document establishes basic proxy terminology, as well as describing
-behaviors specific to Chrome.
+This document establishes basic proxy terminology and describes Chrome-specific
+proxy behaviors.

-## Proxy Server
+[TOC]

-A proxy server is an intermediary used for network requests. It can be
-identified by the 3-tuple (scheme, host, port) where:
+## Proxy server identifiers

-* scheme - protocol used to communicate with the proxy (ex: SOCKSv5, HTTPS).
-* host - IP or hostname of the proxy server (ex: 192.168.0.1)
-* port - TCP/UDP port number (ex: 443)
+A proxy server is an intermediary used for network requests. A proxy server can
+be described by its address, along with the proxy scheme that should be used to
+communicate with it.

-There are a variety of proxy server schemes supported by Chrome. When using an
-explicit proxy in the browser, multiple layers of the network request are
-impacted.
+This can be written as a string using either the "PAC format" or the "URI
+format".

-Difference between proxy server schemes include:
+The PAC format is how one names a proxy server in [Proxy
+auto-config](https://en.wikipedia.org/wiki/Proxy_auto-config) scripts. For
+example:
+* `PROXY foo:2138`
+* `SOCKS5 foo:1080`
+* `DIRECT`

-* Is communication to the proxy done over a secure channel?
-* Is name resolution (ex: DNS) done client side, or proxy side?
-* What authentication schemes to the proxy server are supported?
-* What network traffic can be sent through the proxy?
-
-Identifiers for proxy servers are often written as strings, using either the
-PAC format (ex: `PROXY foo`) or Chrome's URI format (ex: `http://foo`).
+The "URI format" instead encodes the information as a URL. For example:
+* `foo:2138`
+* `http://foo:2138`
+* `socks5://foo:1080`
+* `direct://`

-When a proxy server's scheme is not stated, it's assumed to be HTTP in most
-contexts.
+The port number is optional in both formats. When omitted, a per-scheme default
+is used.

-This can lead to some confusion, particularly when discussing system proxy
-settings. Major platform UIs have converged on the term "Secure proxy" to mean
-the host:port for an (insecure) HTTP proxy to use for proxying https:// URLs.
+See the [Proxy server schemes](#Proxy-server-schemes) section for details on
+what schemes Chrome supports, and how to write them in the PAC and URI formats.

-So when someone refers to their "HTTPS proxy" be aware of this ambiguity. The
-intended meaning could be either "an HTTP proxy for https:// URLs", or "a proxy
-using the HTTPS scheme".
+Most UI surfaces in Chrome (including command lines and policy) expect URI
+formatted proxy server identifiers. However outside of Chrome, proxy servers
+are generally identified less precisely by just an address -- the proxy
+scheme is assumed based on context.

-In this document when we say "an HTTPS proxy", we always mean "a proxy
-that the browser speaks HTTPS to", and not "an (HTTP) proxy used to proxy
-https:// URLs".
+In Windows' proxy settings there are host and port fields for the
+"HTTP", "Secure", "FTP", and "SOCKS" proxy. With the exception of "SOCKS",
+those are all identifiers for insecure HTTP proxy servers (proxy scheme is
+assumed as HTTP).

 ## Proxy resolution

@@ -49,31 +51,42 @@ When the browser is asked to fetch a URL, it needs to decide which IP endpoint
 to send the request to. This can be either a proxy server, or the target host.

 This is called proxy resolution. The input to proxy resolution is a URL, and
-the output is an ordered list of proxy server options.
+the output is an ordered list of [proxy server
+identifiers](#Proxy-server-identifiers).

 What proxies to use can be described using either:

-* Manual proxy settings - proxy resolution is defined using a declarative set
-  of rules. These rules are expressed as a mapping from URL scheme to proxy
-  server(s), and a list of proxy bypass rules for when to go DIRECT instead of
-  using the mapped proxy.
+* [Manual proxy settings](#Manual-proxy-settings) - proxy resolution is defined
+  using a declarative set of rules. These rules are expressed as a mapping from
+  URL scheme to proxy server identifier(s), and a list of proxy bypass rules for
+  when to go DIRECT instead of using the mapped proxy.

 * PAC script - proxy resolution is defined using a JavaScript program, that is
-  invoked whenever fetching a URL to get the list of proxy servers to use.
+  invoked whenever fetching a URL to get the list of proxy server identifiers
+  to use.

 * Auto-detect - the WPAD protocol is used to probe the network (using DHCP/DNS)
  and possibly discover the URL of a PAC script.

 ## Proxy server schemes

-Chrome supports the following proxy server schemes:
+When using an explicit proxy in the browser, multiple layers of the network
+request are impacted, depending on the scheme that is used. Some implications
+of the proxy scheme are:

-* DIRECT
-* HTTP
-* HTTPS
-* SOCKSv4
-* SOCKSv5
-* QUIC
+* Is communication to the proxy done over a secure channel?
+* Is name resolution (ex: DNS) done client side, or proxy side?
+* What authentication schemes to the proxy server are supported?
+* What network traffic can be sent through the proxy?
+
+Chrome supports these proxy server schemes:
+
+* [DIRECT](#DIRECT-proxy-scheme)
+* [HTTP](#HTTP-proxy-scheme)
+* [HTTPS](#HTTPS-proxy-scheme)
+* [SOCKSv4](#SOCKSv4-proxy-scheme)
+* [SOCKSv5](#SOCKSv5-proxy-scheme)
+* [QUIC](#QUIC-proxy-scheme)

 ### DIRECT proxy scheme

@@ -107,7 +120,7 @@ tunnel, the hostname of the target URL is sent to the proxy server in the
 clear.

 HTTP proxies in Chrome support the same HTTP authentiation schemes as for
-target servers: Basic, Digest, Negotiate/NTLM.
+target servers: Basic, Digest, Negotiate, NTLM.

 ### HTTPS proxy scheme

@@ -115,12 +128,21 @@ target servers: Basic, Digest, Negotiate/NTLM.
 * Example identifier (PAC): `HTTPS proxy:8080`
 * Example identifier (URI): `https://proxy:8080`

-This works exactly like an HTTP proxy, except the communication to the proxy
-server is protected by TLS. Hence `http://` requests, and hostnames for
-`https://` requests are not sent in the clear as with HTTP proxies.
+This works like an [HTTP proxy](#HTTP-proxy-scheme), except the
+communication to the proxy server is protected by TLS, and may negotiate
+HTTP/2.
+
+Because the connection to the proxy server is secure, https:// requests
+sent through the proxy are not sent in the clear as with an HTTP proxy.
+Similarly, since CONNECT requests are sent over a protected channel, the
+hostnames for proxied https:// URLs is also not revealed.
+
+In addition to the usual HTTP authentication methods, HTTPS proxies also
+support client certificates.

-In addition to HTTP authentication methods, one can also use client
-certificates to authenticate to HTTPS proxies.
+HTTPS proxies using HTTP/2 can offer better performance in Chrome than a
+regular HTTP proxy due to higher connection limits (HTTP/1.1 proxies in Chrome
+are limited to 32 simultaneous connections across all domains).

 ### SOCKSv4 proxy scheme

@@ -183,18 +205,18 @@ TODO
 The simplest way to configure proxy resolution is by providing a static list of
 rules comprised of:

-1. A mapping of URL schemes to proxy servers
-2. A list of proxy bypass rules
+1. A mapping of URL schemes to [proxy server identifiers](#Proxy-server-identifiers).
+2. A list of [proxy bypass rules](#Proxy-bypass-rules)

 We refer to this mode of configuration as "manual proxy settings".

 Manual proxy settings can succinctly describe setups like:

-* Use HTTPS proxy `foo:8080` for all requests
-* Use HTTP proxy `foo:8080` for all requests except those to a `google.com`
+* Use proxy `http://foo:8080` for all requests
+* Use proxy `http://foo:8080` for all requests except those to a `google.com`
  subdomain.
-* Use HTTP proxy `foo:8080` for all `https://` requests, and the SOCKSv5 proxy
-  `mysocks:90` for everything else
+* Use proxy `http://foo:8080` for all `https://` requests, and proxy
+  `socsk5://mysocks:90` for everything else

 Although manual proxy settings are a ubiquituous way to configure proxies
 across platforms, there is no standard representation or feature set.
@@ -205,14 +227,14 @@ reversing the bypass list, or Gnome's interpretation of bypass patterns as
 suffix matches.

 When defining manual proxy settings in Chrome, we specify three (possibly
-empty) lists of proxy servers:
+empty) lists of [proxy server identifiers](#Proxy-server-identifiers).

-  * proxies for HTTP - A list of proxy servers to use for `http://` requests,
-    if non-empty.
-  * proxies for HTTPS - A list of proxy servers to use for `https://` requests,
-    if non-empty.
-  * other proxies - A list of proxy servers to use for everything else
-    (whatever isn't matched by the other two lists)
+  * proxies for HTTP - A list of proxy server identifiers to use for `http://`
+    requests, if non-empty.
+  * proxies for HTTPS - A list of proxy server identifiers to use for
+    `https://` requests, if non-empty.
+  * other proxies - A list of proxy server identifiers to use for everything
+    else (whatever isn't matched by the other two lists)

 There are a lot of ways to end up with manual proxy settings in Chrome
 (discussed in other sections).
@@ -220,8 +242,8 @@ There are a lot of ways to end up with manual proxy settings in Chrome
 The following examples will use the command line method. Launching Chrome with
 `--proxy-server=XXX` (and optionally `--proxy-bypass-list=YYY`)

-Example: To use the HTTP proxy `foo:8080` for all requests we can launch
-Chrome with `--proxy-server="http://foo:8080"`. This translates into:
+Example: To use proxy `http://foo:8080` for all requests we can launch
+Chrome with `--proxy-server="http://foo:8080"`. This translates to:

  * proxies for HTTP - *empty*
  * proxies for HTTPS - *empty*
@@ -238,8 +260,8 @@ This command line means:
  * other proxies - `http://foo:8080`, `direct://`

 If instead we wanted to proxy only `http://` URLs through the
-HTTPS proxy `foo:443`, and have everything else use the SOCKSv5 proxy
-`mysocks:1080` we could launch Chrome with
+HTTPS proxy `https://foo:443`, and have everything else use the SOCKSv5 proxy
+`socks5://mysocks:1080` we could launch Chrome with
 `--proxy-server="http=https://foo:443;socks=socks5://mysocks:1080"`. This now
 expands to:

@@ -247,18 +269,20 @@ expands to:
  * proxies for HTTPS - *empty*
  * other proxies - `socks5://mysocks:1080`

-The command line above uses WinInet's proxy map format, with two modifications:
+The command line above uses WinInet's proxy map format, with some additional
+features:

-* Proxy servers can be optionally prefixed with a scheme (i.e. Chrome's "URI
-  format" for proxy server identifiers)
-* The `socks=` mapping is understood as "other proxies". The subsequent proxy
-  list can include proxies of any scheme, however if the scheme is unspecified
-  it is understood to be `socks4://`.
+* Instead of naming proxy servers by just a hostname:port, you can use Chrome's
+  URI format for proxy server identifiers. In other words, you can prefix the
+  proxy scheme so it doesn't default to HTTP.
+* The `socks=` mapping is understood more broadly as "other proxies". The
+  subsequent proxy list can include proxies of any scheme, however if the
+  scheme is omitted it will be understood as SOCKSv4 rather than HTTP.

-## Mapping WebSockets URLs to a proxy
+### Mapping WebSockets URLs to a proxy

-Manual proxy settings don't have mappings for `ws://` or `wss://` URLs - you
-can't specify a separate proxy to use for those schemes.
+[Manual proxy settings](#Manual-proxy-settings) don't have mappings for `ws://`
+or `wss://` URLs.

 Selecting a proxy for these URL schemes is a bit different from other URL
 schemes. The algorithm that Chrome uses is:
@@ -272,19 +296,22 @@ This is per the recommendation in section 4.1.3 of [RFC

 It is possible to route `ws://` and `wss://` separately using a PAC script.

-## Proxy credentials in manual proxy settings
+### Proxy credentials in manual proxy settings

-Most platforms' manual proxy settings allow specifying a cleartext
-username/password for proxy sign in. Chrome does not implement this, and will
-not use any credentials embedded in the proxy settings.
+Most platforms' [manual proxy settings](#Manual-proxy-settings) allow
+specifying a cleartext username/password for proxy sign in. Chrome does not
+implement this, and will not use any credentials embedded in the proxy
+settings.

 Proxy authentication will instead go through the ordinary flow to find
 credentials.

 ## Proxy bypass rules

-In addition to specifying three lists of proxy servers, Chrome's manual proxy
-settings also lets you specify a list of "proxy bypass rules".
+In addition to specifying three lists of [proxy server
+identifiers](#proxy-server-identifiers), Chrome's [manual proxy
+settings](#Manual-proxy-settings) lets you specify a list of "proxy bypass
+rules".

 This ruleset determines whether a given URL should skip use of a proxy all
 together, even when a proxy is otherwise defined for it.
@@ -369,8 +396,8 @@ IPV4_LITERAL "/" PREFIX_LENGTH_IN_BITS
 Matches any URL whose hostname is an IPv4 literal, and falls between the given
 address range.

-Only applies to URLs that are IP literals - see "Meaning of IP address range
-bypass rules".
+Note this [only applies to URLs that are IP
+literals](#Meaning-of-IP-address-range-bypass-rules).

 Examples:

@@ -385,8 +412,8 @@ IPV6_LITERAL "/" PREFIX_LENGTH_IN_BITS
 Matches any URL that is an IPv6 literal that falls between the given range.
 Note that IPv6 literals must *not* be bracketed.

-Only applies to URLs that are IP literals - see "Meaning of IP address range
-bypass rules".
+Note this [only applies to URLs that are IP
+literals](#Meaning-of-IP-address-range-bypass-rules).

 Examples:

@@ -408,8 +435,8 @@ the "Don't use proxy server for local (intranet) addresses" on Windows.

 The rule name comes from WinInet, and can easily be confused with the concept
 of localhost. However the two concepts are completely orthogonal. In practice
-one wouldn't add rules to bypass localhost, as it is already done implicitly
-(see "Implicit bypass rules").
+one wouldn't add rules to bypass localhost, as it is [already done
+implicitly](#Implicit-bypass-rules).

 ### Bypass rule: Subtract implicit rules

@@ -417,10 +444,9 @@ one wouldn't add rules to bypass localhost, as it is already done implicitly
 <-loopback>
 ```

-*Subtracts* the implicit proxy bypass rules (localhost and link local
-addresses). See the "Implicit bypass rules" section for details on when/why to
-use this, and the security caveats to doing so. Generally this is used for test
-setups.
+*Subtracts* the [implicit proxy bypass rules](#Implicit-bypass-rules)
+(localhost and link local addresses). This is generally only needed for test
+setupe. Beware of the security implications to proxying localhost.

 Whereas regular bypass rules instruct the browser about URLs that should *not*
 use the proxy, this rule has the opposite effect and tells the browser to
@@ -432,8 +458,8 @@ than `127.0.0.1;<-loopback>`.

 ### Meaning of IP address range bypass rules

-The IP address range bypass rules in manual proxy settings applies ONLY TO URL
-LITERALS. This is not what one would intuitively expect!
+The IP address range bypass rules in manual proxy settings applies only to URL
+literals. This is not what one would intuitively expect.

 Example:

@@ -497,7 +523,7 @@ Historical support in Chrome:
 * In M72 Chrome generalized the implicit proxy bypass rules to manually
  configured proxies

-## Overriding the implicit bypass rules
+### Overriding the implicit bypass rules

 If you want traffic to `localhost` to be sent through a proxy despite the
 security concerns, it can be done by adding the special proxy bypass rule
@@ -516,8 +542,9 @@ proxy for localhost URLs.

 ## Evaluating proxy lists (proxy fallback)

-Proxy resolution results in a _list_ of proxy servers to use for a given
-request, not just a single proxy server.
+Proxy resolution results in a _list_ of [proxy server
+identifiers](#Proxy-server-identifiers) to use for a
+given request, not just a single proxy server identifier.

 For instance, consider this PAC script:

@@ -532,12 +559,13 @@ function FindProxyForURL(url, host) {
 ```

 What proxy will Chrome use for connections to `www.example.com`, given that
-we have a choice of 3 separate proxies, each of different type?
+we have a choice of three separate proxy server identifiers to choose from
+{`http://proxy1:80`, `https://proxy2:443`, `socks5://proxy3:1080`}?

-Initially, Chrome will try the proxies in order. This means first attempting the
-request through the HTTP WebProxy `proxy1`. If that "fails", the request is
-next attempted through the HTTPS proxy `proxy2`. Lastly if that fails, the
-request is attempted through the SOCKSv5 proxy `proxy3`.
+Initially, Chrome will try the proxies in order. This means first attempting
+the request through `http://proxy1:80`. If that "fails", the request is
+next attempted through `https://proxy2:443`. Lastly if that fails, the
+request is attempted through `socks5://proxy3:1080`.

 This process is referred to as _proxy fallback_. What constitutes a
 "failure" is described later.
@@ -546,26 +574,24 @@ Proxy fallback is stateful. The actual order of proxy attempts made be Chrome
 is influenced by the past responsiveness of proxy servers.

 Let's say we request `http://www.example.com/`. Per the PAC script this
-resolves to:
+resolves to a list of three proxy server identifiers:

-```
-"PROXY proxy1; HTTPS proxy2; SOCKS5 proxy3"
-```
+{`http://proxy1:80`, `https://proxy2:443`, `socks5://proxy3:1080`}

 Chrome will first attempt to issue the request through these proxies in the
-left-to-right order (`proxy1`, `proxy2`, `proxy3`).
+left-to-right order.

-Let's say that the attempt through `proxy1` fails, but then the attempt through
-`proxy2` succeeds. Chrome will mark `proxy1` as _bad_ for the next 5 minutes.
-Being marked as _bad_ means that `proxy1` is de-prioritized with respect to
-other proxies options (including DIRECT) that are not marked as bad.
+Let's say that the attempt through `http://proxy1:80` fails, but then the
+attempt through `https://proxy2:443` succeeds. Chrome will mark
+`http://proxy1:80` as _bad_ for the next 5 minutes. Being marked as _bad_
+means that `http://proxy1:80` is de-prioritized with respect to
+other proxy server identifiers (including `direct://`) that are not marked as
+bad.

 That means the next time `http://www.example.com/` is requested, the effective
 order for proxies to attempt will be:

-```
-HTTPS proxy2; SOCKS5 proxy3; "PROXY proxy1"
-```
+{`https://proxy2:443`, `socks5://proxy3:1080`, `http://proxy1:80`}

 Conceptually, _bad_ proxies are moved to the end of the list, rather than being
 removed from consideration all together.
@@ -615,7 +641,7 @@ button on
 will not give feedback that the bad proxies were cleared, however capturing a
 new NetLog dump can confirm it was cleared.

-## Arguments passed to `FindProxyForURL(url, host)` in PAC scripts
+## Arguments passed to FindProxyForURL() in PAC scripts

 PAC scripts in Chrome are expected to define a JavaScript function
 `FindProxyForURL`.
@@ -668,7 +694,7 @@ type, since future versions of Chrome may [deprecate that
 capability](https://bugs.chromium.org/p/chromium/issues/detail?id=882536) in
 favor of a consistent policy.

-## Resolving client's IP address within a PAC script using `myIpAddress()`
+## Resolving client's IP address within a PAC script using myIpAddress()

 PAC scripts can invoke `myIpAddress()` to obtain the client's IP address. This
 function returns a single IP literal, or `"127.0.0.1"` on failure.
@@ -705,10 +731,10 @@ This sequence of steps explicitly favors IPv4 over IPv6 results.
 *Historical note*: Prior to M72, Chrome's implementation of `myIpAddress()` was
 effectively just `getaddrinfo(gethostname)`. This is now step 2 of the heuristic.

-### What about `var pacUseMultihomedDNS`?
+### What about pacUseMultihomedDNS?

-In Firefox, if you define a global named `pacUseMultihomedDNS` in your PAC
-script, it causes `myIpAddress()` to report the IP address of the interface
+In Firefox, if you define a global variable named `pacUseMultihomedDNS` in your
+PAC script, it causes `myIpAddress()` to report the IP address of the interface
 that would (likely) have been used had we connected to it DIRECT.

 In particular, it will do a DNS resolution of the target host (the hostname of
@@ -720,7 +746,7 @@ meaning. A PAC script is free to define such a global, and it won't have
 side-effects. Chrome has no APIs or settings to change `myIpAddress()`'s
 algorithm.

-## Resolving client's IP address within a PAC script using `myIpAddressEx()`
+## Resolving client's IP address within a PAC script using myIpAddressEx()

 Chrome supports the [Microsoft PAC
 extension](https://docs.microsoft.com/en-us/windows/desktop/winhttp/myipaddressex)