Commit b3353499 authored by Charlie Reis's avatar Charlie Reis Committed by Commit Bot

Expand navigation markdown documentation.

This CL revises navigation.md and splits out navigation_concepts.md
for describing several concepts that are important for
understanding the navigation logic.

BUG=1015882

Change-Id: I0397d2906aec9c24a658d96cf67123dc6bd7c3af
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2225745
Commit-Queue: Charlie Reis <creis@chromium.org>
Reviewed-by: default avatarAlex Moshchuk <alexmos@chromium.org>
Reviewed-by: default avatarArthur Sonzogni <arthursonzogni@chromium.org>
Reviewed-by: default avatarAvi Drissman <avi@chromium.org>
Reviewed-by: default avatarNasko Oskov <nasko@chromium.org>
Cr-Commit-Position: refs/heads/master@{#782858}
parent e53980fb
## Life of a Navigation # Life of a Navigation
Navigation is one of the main functions of a browser. It is the process Navigation is one of the main functions of a browser. It is the process through
through which the user loads documents. This documentation traces the life of which the user loads documents. This documentation traces the life of a
a navigation from the time a URL is typed in the URL bar to the time the web navigation from the time a URL is typed in the URL bar to the time the web page
page is completely loaded. is completely loaded. This is one example of many types of navigations, some of
which may start in different places (e.g., in the renderer process).
See also:
* [Life of a Navigation tech talk](https://youtu.be/mX7jQsGCF6E) and
[slides](https://docs.google.com/presentation/d/1YVqDmbXI0cllpfXD7TuewiexDNZYfwk6fRdmoXJbBlM/edit),
for an overview from Chrome University.
* [Navigation Concepts](navigation_concepts.md), for useful notes on
navigation-related concepts in Chromium.
### BeforeUnload [TOC]
Once a URL is typed, the first step of a navigation is to execute the
## BeforeUnload
Once a URL is entered, the first step of a navigation is to execute the
beforeunload event handler of the previous document, if a document is already beforeunload event handler of the previous document, if a document is already
loaded. This allows the previous document to prompt the user whether they want loaded. This allows the previous document to prompt the user whether they want
to leave, to avoid losing any unsaved data. In this case, the user can cancel to leave, to avoid losing any unsaved data. In this case, the user can cancel
the navigation and no more work will be performed. the navigation and no more work will be performed.
### Network Request and Response ## Network Request and Response
If there is no beforeunload handler registered, or the user agrees to proceed, If there is no beforeunload handler registered, or the user agrees to proceed,
the next step is making a network request to the specified URL to retrieve the the next step is making a network request to the specified URL to retrieve the
contents of the document to be rendered. Assuming no network error is contents of the document to be rendered. (Note that not all navigations will go
encountered (e.g. DNS resolution error, socket connection timeout, etc.), the to the actual network, for cases like ServiceWorkers, WebUI, cache, data:, etc.)
server will respond with data, with the response headers coming first. The Assuming no network error is encountered (e.g. DNS resolution error, socket
parsed headers give enough information to determine what needs to be done connection timeout, etc.), the server will respond with data, with the response
next. headers coming first. The parsed headers give enough information to determine
what needs to be done next.
The HTTP response code allows the browser process to know whether one of the The HTTP response code allows the browser process to know whether one of the
following conditions has occurred: following conditions has occurred:
...@@ -33,27 +44,27 @@ following conditions has occurred: ...@@ -33,27 +44,27 @@ following conditions has occurred:
* An HTTP level error has occurred (response 4xx, 5xx) * An HTTP level error has occurred (response 4xx, 5xx)
There are two cases where a navigation network request can complete without There are two cases where a navigation network request can complete without
resulting in a new document being rendered. The first one is HTTP response resulting in a new document being rendered. The first one is HTTP response code
code 204 or 205, which tells the browser that the response was successful, but 204 or 205, which tells the browser that the response was successful, but there
there is no content that follows, and therefore the current document must is no content that follows, and therefore the current document must remain
remain active. The other case is when the server responds with a header active. The other case is when the server responds with a `Content-Disposition`
indicating that the response must be treated as a download. All the data is response header indicating that the response must be treated as a download
read by the browser and then saved to the local filesystem. instead of a navigation.
If the server responds with a redirect, the network stack makes another If the server responds with a redirect, Chromium makes another request based on
request based on the HTTP response code and the Location header. The browser the HTTP response code and the Location header. The browser continues following
continues following redirects until either an error or a successful response redirects until either an error or a successful response is encountered.
is encountered.
Once there are no more redirects, if the response is not a 204/205 or a Once there are no more redirects, the network stack determines if MIME type
download, the network stack reads a small chunk of the actual response data sniffing is needed to detect what type of response the server has sent. This is
that the server has sent. By default this is used to perform MIME type only needed if the response is not a 204/205 nor a download, doesn't already
sniffing, to determine what type of response the server has sent. have a `Content-Type` response header, and doesn’t include a
This sniffing behavior can be suppressed by sending a “X-Content-Type-Options: `X-Content-Type-Options: nosniff` response header. If MIME type sniffing is
nosniff” header as part of the response headers. needed, the network stack will read a small chunk of the actual response data
before proceeding with the commit.
### Commit ## Commit
At this point the response is passed from the network stack to the browser At this point the response is passed from the network stack to the browser
process to be used for rendering a new document. The browser process selects process to be used for rendering a new document. The browser process selects
...@@ -77,18 +88,17 @@ receives the commit message from the renderer process, the navigation is ...@@ -77,18 +88,17 @@ receives the commit message from the renderer process, the navigation is
complete. complete.
### Loading ## Loading
Even once navigation is complete, the user doesn't actually see the new page Even once navigation is complete, the user doesn't actually see the new page
yet. Most people use the word navigation to describe the act of moving from yet. Most people use the word navigation to describe the act of moving from
one page to another, but in Chromium we separate that process into two phases. one page to another, but in Chromium we separate that process into two phases.
So far we have described the _navigation_ phase; once the navigation has been So far we have described the _navigation_ phase; once the navigation has been
committed, the process moves into the _loading_ phase. Loading consists of committed, Chromium moves into the _loading_ phase. Loading consists of
reading the remaining response data from the server, parsing it, rendering the reading the remaining response data from the server, parsing it, rendering the
document so it is visible to the user, executing any script accompanying it, document so it is visible to the user, executing any script accompanying it,
and loading any subresources specified by the document. and loading any subresources specified by the document.
The main reason for splitting into these two phases is that errors are treated The main reason for splitting into these two phases is that errors are treated
differently before and after a navigation commits. Consider the case where the differently before and after a navigation commits. Consider the case where the
server responds with an HTTP error code. When this happens, the browser still server responds with an HTTP error code. When this happens, the browser still
...@@ -101,74 +111,54 @@ times out. In that case the browser displays as much of the new document as it ...@@ -101,74 +111,54 @@ times out. In that case the browser displays as much of the new document as it
can, without showing an error page. can, without showing an error page.
### WebContentsObserver ## WebContentsObserver
Chromium exposes the various stages of navigation and document loading through Chromium exposes the various stages of navigation and document loading through
methods on the [WebContentsObserver] interface. methods on the [WebContentsObserver] interface.
#### Navigation ### Navigation
* DidStartNavigation - invoked after executing the beforeunload event handler * `DidStartNavigation` - invoked after executing the beforeunload event handler
and before making the initial network request. and before making the initial network request.
* DidRedirectNavigation - invoked every time a server redirect is encountered. * `DidRedirectNavigation` - invoked every time a server redirect is encountered.
* ReadyToCommitNavigation - invoked at the time the browser process has * `ReadyToCommitNavigation` - invoked at the time the browser process has
determined that it will commit the navigation and has picked a renderer determined that it will commit the navigation and has picked a renderer
process for it, but before it has sent it to the renderer process. It is not process for it, but before it has sent it to the renderer process. It is not
invoked for same-document navigations. invoked for same-document navigations.
* DidFinishNavigation - invoked once the navigation has committed. The commit * `DidFinishNavigation` - invoked once the navigation has committed. The commit
can be either an error page if the server responded with an error code or a can be either an error page if the server responded with an error code or a
successful document. successful document.
#### Loading ### Loading
* DidStartLoading - invoked once per WebContents, when a navigation is about * `DidStartLoading` - invoked once per WebContents, when a navigation is about
to start, after executing the beforeunload handler. This is equivalent to the to start, after executing the beforeunload handler. This is equivalent to the
browser UI starting to show a spinner or other visual indicator for browser UI starting to show a spinner or other visual indicator for
navigation and is invoked before the DidStartNavigation method for the navigation and is invoked before the DidStartNavigation method for the
navigation. navigation.
* DOMContentLoaded - invoked per RenderFrameHost, when the document itself * `DOMContentLoaded` - invoked per RenderFrameHost, when the document itself
has completed loading, but before subresources may have completed loading. has completed loading, but before subresources may have completed loading.
* DidFinishLoad - invoked per RenderFrameHost, when the document and all of its * `DidFinishLoad` - invoked per RenderFrameHost, when the document and all of
subresources have finished loading. its subresources have finished loading.
* DidStopLoading - invoked once per WebContents, when the top-level document, * `DidStopLoading` - invoked once per WebContents, when the top-level document,
all of its subresources, all subframes, and their subresources have completed all of its subresources, all subframes, and their subresources have completed
loading. This is equivalent to the browser UI stop showing a spinner or other loading. This is equivalent to the browser UI stop showing a spinner or other
visual indicator for navigation and loading. visual indicator for navigation and loading.
* DidFailLoad - invoked per RenderFrameHost, when the document load failed, for * `DidFailLoad` - invoked per RenderFrameHost, when the document load failed,
example due to network connection termination before reading all of the for example due to network connection termination before reading all of the
response data. response data.
### Same-Document and Cross-Document Navigations ## NavigationThrottles
Chromium defines two types of navigations based on whether the navigation NavigationThrottles allow observing, deferring, blocking, and canceling a given
results in a new document or not. A _cross-document_ navigation is one that navigation. They should not generally be used for modifying a navigation (e.g.,
results in creating a new document to replace an existing document. This is simulating a redirect), as discussed in
the type of navigation that most users are familiar with. A _same-document_ [Navigation Concepts](navigation_concepts.md#rules-for-canceling-navigations).
navigation does not create a new document, but rather keeps the same document They are typically registered in
and changes state associated with it. A same-document navigation does create a `NavigationThrottleRunner::RegisterNavigationThrottles` or
new session history entry, even though the same document remains active. This `ContentBrowserClient::CreateThrottlesForNavigation`.
can be the result of one of the following cases:
* Navigating to a fragment within an existing document (e.g.
http<nolink>://foo.com/1.html#fragment)
* A document calling the history.pushState() or history.replaceState() APIs
* A session history navigation, such as going back/forward, to an existing entry
for the same document.
### Browser-Initiated and Renderer-Initiated Navigations
Chromium also defines two types of navigations based on which process
started the navigation: _browser-initiated_ and _renderer-initiated_. This
distinction is useful when making decisions about navigations, for example
whether an ongoing navigation needs to be cancelled or not when a new
navigation is starting. It is also used for some security decisions, such as
whether to display the target URL of the navigation in the URL bar or not.
Browser-initiated navigations are more trustworthy, as they are usually in
response to a user interaction with the UI of the browser. Renderer-initiated
navigations originate in the renderer process, which may be under the control
of an attacker.
[WebContentsObserver]: https://source.chromium.org/chromium/chromium/src/+/master:content/public/browser/web_contents_observer.h [WebContentsObserver]: https://source.chromium.org/chromium/chromium/src/+/master:content/public/browser/web_contents_observer.h
# Navigation Concepts
This documentation covers a set of important topics to understand related to
navigation. For a timeline of how a given navigation proceeds, see [Life of a
Navigation](navigation.md).
[TOC]
## Same-Document and Cross-Document Navigations
Chromium defines two types of navigations based on whether the navigation
results in a new document or not. A _cross-document_ navigation is one that
results in creating a new document to replace an existing document. This is
the type of navigation that most users are familiar with. A _same-document_
navigation does not create a new document, but rather keeps the same document
and changes state associated with it. A same-document navigation does create a
new session history entry, even though the same document remains active. This
can be the result of one of the following cases:
* Navigating to a fragment within an existing document (e.g.
`https://foo.com/1.html#fragment`).
* A document calling the `history.pushState()` or `history.replaceState()` APIs.
* A session history navigation that stays in the same document, such as going
back/forward to an existing entry for the same document.
## Browser-Initiated and Renderer-Initiated Navigations
Chromium also defines two types of navigations based on which process started
the navigation: _browser-initiated_ and _renderer-initiated_. This distinction
is useful when making decisions about navigations, for example whether an
ongoing navigation needs to be cancelled or not when a new navigation is
starting. It is also used for some security decisions, such as whether to
display the target URL of the navigation in the address bar or not.
Browser-initiated navigations are more trustworthy, as they are usually in
response to a user interaction with the UI of the browser. Renderer-initiated
navigations originate in the renderer process, which may be under the control of
an attacker. Note that some renderer-initiated navigations may be considered
user-initiated, if they were performed with a [user
activation](https://mustaqahmed.github.io/user-activation-v2/) (e.g., links),
while others are not user-initiated (e.g., script navigations).
## Last Committed, Pending, and Visible URLs
Many features care about the URL or Origin of a given document, or about a
pending navigation, or about what is showing in the address bar. These are all
different concepts with different security implications, so be sure to use the
correct value for your use case.
See [Origin vs URL](security/origin-vs-url.md) when deciding whether to check
the Origin or URL. In many cases that care about the security context, Origin
should be preferred.
The _last committed_ URL or Origin represents the document that is currently in
the frame, regardless of what is showing in the address bar. This is almost
always what should be used for feature-related state, unless a feature is
explicitly tied to the address bar (e.g., padlock icon). See
`RenderFrameHost::GetLastCommittedOrigin` (or URL) and
`NavigationController::GetLastCommittedEntry`.
The _pending_ URL exists when a main frame navigation has started but has not
yet committed. This URL is only sometimes shown to the user in the address bar;
see the description of visible URLs below. Features should rarely need to care
about the pending URL, unless they are probing for a navigation they expect to
have started. See `NavigationController::GetPendingEntry`.
The _visible_ URL is what the address bar displays. This is carefully managed to
show the main frame's last committed URL in most cases, and the pending URL in
cases where it is safe and unlikely to be abused for a _URL spoof attack_ (where
an attacker is able to display content as if it came from a victim URL). In
general, the visible URL is:
* The pending URL for browser-initiated navigations like typed URLs or
bookmarks, excluding session history navigations.
* The last committed URL for renderer-initiated navigations, where an attacker
might have control over the contents of the document and the pending URL.
* A renderer-initiated navigation's URL is only visible while pending if it
opens in a new unmodified tab (so that an unhelpful `about:blank` URL is not
displayed), but only until another document tries to access the initial empty
document of the new tab. For example, an attacker window might open a new tab
to a slow victim URL, then inject content into the initial `about:blank`
document as if the slow URL had committed. If that occurs, the visible URL
reverts to `about:blank` to avoid a URL spoof scenario. Once the initial
navigation commits in the new tab, pending renderer-initiated navigation URLs
are no longer displayed.
## Virtual URLs
Virtual URLs are a way for features to change how certain URLs are displayed to
the user (whether visible or committed). They are generally implemented using
BrowserURLHandlers. Examples include:
* View Source URLs, where the `view-source:` prefix is not present in the
actual committed URL.
* DOM Distiller URLs, where the original URL is displayed to the user rather
than the more complex distiller URL.
## Redirects
Navigations can redirect to other URLs in two different ways.
A _server redirect_ happens when the browser receives a 300-level HTTP response
code before the document commits, telling it to request a different URL,
possibly cross-origin. The new request will usually be an HTTP GET request,
unless the redirect is triggered by a 307 or 308 response code, which preserves
the original request method and body. Server redirects are managed by a single
NavigationRequest. No document is committed to session history, but the original
URL remains in the redirect chain.
In contrast, a _client redirect_ happens after a document has committed, when
the HTML in the document instructs the browser to request a new document (e.g.,
via meta tags or JavaScript). Blink classifies the navigation as a client
redirect based partly on how much time has passed. In this case, a session
history item is created for the redirecting document, but it gets replaced when
the actual destination document commits. A separate NavigationRequest is used
for the second navigation.
## Concurrent Navigations
Many navigations can be in progress simultaneously. In general, every frame is
considered independent and may have its own navigations(s), with each tracked by
a NavigationRequest. Within a frame, it is possible to have multiple concurrent
navigations:
* **A cross-document navigation waiting for its final response (at most one per
frame).** The NavigationRequest is owned by FrameTreeNode during this stage,
which can take several seconds. Some special case navigations do not use a
network request and skip this stage (e.g., `about:blank`, `about:srcdoc`,
MHTML).
* **A queue of cross-document navigations that are between "ready to commit"
and "commit," while the browser process waits for a commit acknowledgement
from the renderer process.** While rare, it is possible for multiple
navigations to be in this stage concurrently if the renderer process is slow.
The NavigationRequests are owned by the RenderFrameHost during this stage,
which is usually short-lived.
* **Same-document navigations.** These can be:
* Renderer-initiated (e.g., `pushState`, fragment link click). In this case,
the browser process creates and destroys a NavigationRequest in the same
task.
* Browser-initiated (e.g., omnibox fragment change). In this case, the
browser process creates a NavigationRequest owned by the RenderFrameHost
and immediately tells the renderer to commit.
Note that the navigation code is not re-entrant. Callers must not start a new
navigation while a call to `NavigateWithoutEntry` or
`NavigateToExistingPendingEntry` is on the stack, to avoid a CHECK that guards
against use-after-free for `pending_entry_`.
## Rules for Canceling Navigations
We generally do not want an abusive page to prevent the user from navigating
away, such as by endlessly starting new navigations that interrupt or cancel the
user's attempts. Generally, a new navigation will cancel an existing one in a
frame, but we make the following exception: a renderer-initiated navigation is
ignored iff there is an ongoing browser-initiated navigation and the new
navigation lacks a user activation. (This is implemented in
`Navigator::ShouldIgnoreIncomingRendererRequest`.)
NavigationThrottles also have an ability to cancel navigations when desired by a
feature. Keep in mind that it is problematic to simulate a redirect by canceling
a navigation and starting a new one, since this may lose relevant context from
the original navigation (e.g., ReloadType, CSP state, Sec-Fetch-Metadata state,
redirect chain, etc), and it will lead to unexpected observer events and metrics
(e.g., extra navigation starts, inflated numbers of canceled navigations, etc).
Feature authors that want to simulate redirects may want to consider using a
URLLoaderRequestInterceptor instead.
## Error Pages
There are several types of error pages that can be displayed when a navigation
is not successful.
The server can return a custom error page, such as a 400 or 500 level HTTP
response code page. These pages are rendered much like a successful navigation
to the site (and go into an appropriate process for that site), but the error
code is available and `NavigationHandle::IsErrorPage()` is true.
If the navigation fails to get a response from the server (e.g., the DNS lookup
fails), then Chromium will display an error page. For main frames, this error
page will be in a special error page process, not affiliated with any site or
containing any untrustworthy content from the web. In these failed cases,
NetErrorHelperCore may try to reload the URL at a later time (e.g., if a network
connection comes back online), to load the document in an appropriate process.
If instead the navigation is blocked (e.g., by an extension API or a
NavigationThrottle), then Chromium will similarly display an error page in a
special error page process. However, in blocked cases, Chromium will not attempt
to reload the URL at a later time.
## Interstitial Pages
Interstitial pages are implemented as committed error pages. (Prior to
[issue 448486](https://crbug.com/448486), they were implemented as overlays.)
The original in-progress navigation is canceled when the interstitial is
displayed, and Chromium repeats the navigation if the user chooses to proceed.
Note that some interstitials can be shown after a page has committed (e.g., when
a subresource load triggers a Safe Browsing error). In this case, Chromium
navigates away from the original page to the interstitial page, with the intent
of replacing the original NavigationEntry. However, the original NavigationEntry
is preserved in `NavigationControllerImpl::entry_replaced_by_post_commit_error_`
in case the user chooses to dismiss the interstitial and return to the original
page.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment