Reorganize isolated origins into a map indexed by site URLs.

Previously isolated origins were stored in a set. This resulted in large overheads when URLs needed to be checked against isolated origins, since the set had to be walked through linearly, running a fairly slow comparison function (IsolatedOriginUtil::DoesOriginMatchIsolatedOrigin()) on each isolated origin, and this lookup turned out to be fairly common. This CL reorganizes the underlying data structure into a map keyed by site URLs. For example, if we add {https://foo.com, https://bar.foo.com, https://www.bar.com} as isolated origins, the map will look like this: https://foo.com -> { https://foo.com, https://bar.foo.com } https://bar.com -> { https://www.bar.com } When checking a URL against isolated origins, we now can just look up the map using the URL's site and then look for the most specific isolated origins as before. The site can be found in O(log n) time, and the corresponding list of origins to search using DoesOriginMatchIsolatedOrigin() changes from O(n) to O(originsPerSite), which is expected to be ~1. Bug: 877653 Change-Id: I41667f94cea7f1f8e5b53fa23443d83cfbb3c0ed Reviewed-on: https://chromium-review.googlesource.com/1208942 Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Commit-Position: refs/heads/master@{#590063}

Reorganize isolated origins into a map indexed by site URLs.
Previously isolated origins were stored in a set. This resulted in large overheads when URLs needed to be checked against isolated origins, since the set had to be walked through linearly, running a fairly slow comparison function (IsolatedOriginUtil::DoesOriginMatchIsolatedOrigin()) on each isolated origin, and this lookup turned out to be fairly common. This CL reorganizes the underlying data structure into a map keyed by site URLs. For example, if we add {https://foo.com, https://bar.foo.com, https://www.bar.com} as isolated origins, the map will look like this: https://foo.com -> { https://foo.com, https://bar.foo.com } https://bar.com -> { https://www.bar.com } When checking a URL against isolated origins, we now can just look up the map using the URL's site and then look for the most specific isolated origins as before. The site can be found in O(log n) time, and the corresponding list of origins to search using DoesOriginMatchIsolatedOrigin() changes from O(n) to O(originsPerSite), which is expected to be ~1. Bug: 877653 Change-Id: I41667f94cea7f1f8e5b53fa23443d83cfbb3c0ed Reviewed-on: https://chromium-review.googlesource.com/1208942 Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Commit-Position: refs/heads/master@{#590063}
4e19b363 · Alex Moshchuk · Commit Bot · 27357612 · 4e19b363 · 4e19b363
Commit 4e19b363 authored Sep 10, 2018 by Alex Moshchuk Committed by Commit Bot Sep 10, 2018
5 changed files
--- a/content/browser/child_process_security_policy_impl.cc
+++ b/content/browser/child_process_security_policy_impl.cc
@@ -1230,10 +1230,18 @@ void ChildProcessSecurityPolicyImpl::AddIsolatedOrigins(
    return true;  // Remove.
  });
-  // Taking the lock once and doing a batch insertion via base::flat_set::insert
-  // is important because of performance characteristics of base::flat_set.
  base::AutoLock lock(lock_);
-  isolated_origins_.insert(origins_to_add.begin(), origins_to_add.end());
+  for (const url::Origin& origin : origins_to_add) {
+    // GetSiteForOrigin() is used to look up the site URL of |origin| to speed
+    // up the isolated origin lookup.  This only performs a straightforward
+    // translation of an origin to eTLD+1; it does *not* take into account
+    // effective URLs, isolated origins, and other logic that's not needed
+    // here, but *is* typically needed for making process model decisions. Be
+    // very careful about using GetSiteForOrigin() elsewhere, and consider
+    // whether you should be using GetSiteForURL() instead.
+    GURL key(SiteInstanceImpl::GetSiteForOrigin(origin));
+    isolated_origins_[key].insert(origin);
+  }
 }
 bool ChildProcessSecurityPolicyImpl::IsIsolatedOrigin(
@@ -1245,20 +1253,53 @@ bool ChildProcessSecurityPolicyImpl::IsIsolatedOrigin(
 bool ChildProcessSecurityPolicyImpl::GetMatchingIsolatedOrigin(
    const url::Origin& origin,
    url::Origin* result) {
+  // GetSiteForOrigin() is used to look up the site URL of |origin| to speed
+  // up the isolated origin lookup.  This only performs a straightforward
+  // translation of an origin to eTLD+1; it does *not* take into account
+  // effective URLs, isolated origins, and other logic that's not needed
+  // here, but *is* typically needed for making process model decisions. Be
+  // very careful about using GetSiteForOrigin() elsewhere, and consider
+  // whether you should be using GetSiteForURL() instead.
+  return GetMatchingIsolatedOrigin(
+      origin, SiteInstanceImpl::GetSiteForOrigin(origin), result);
+}
+bool ChildProcessSecurityPolicyImpl::GetMatchingIsolatedOrigin(
+    const url::Origin& origin,
+    const GURL& site_url,
+    url::Origin* result) {
  *result = url::Origin();
  base::AutoLock lock(lock_);
+  // Look up the list of origins corresponding to |origin|'s site.
+  auto it = isolated_origins_.find(site_url);
+  // Subtle corner case: if the site's host ends with a dot, do the lookup
+  // without it.  A trailing dot shouldn't be able to bypass isolated origins:
+  // if "https://foo.com" is an isolated origin, "https://foo.com." should
+  // match it.
+  if (it == isolated_origins_.end() && site_url.host().back() == '.') {
+    GURL::Replacements replacements;
+    std::string host = site_url.host();
+    host.pop_back();
+    replacements.SetHostStr(host);
+    it = isolated_origins_.find(site_url.ReplaceComponents(replacements));
+  }
  // If multiple isolated origins are registered with a common domain suffix,
  // return the most specific one.  For example, if foo.isolated.com and
-  // isolated.com are both isolated origins, bar.foo.isolated.com should return
+  // isolated.com are both isolated origins, bar.foo.isolated.com should
-  // foo.isolated.com.
+  // return foo.isolated.com.
  bool found = false;
-  for (auto isolated_origin : isolated_origins_) {
+  if (it != isolated_origins_.end()) {
-    if (IsolatedOriginUtil::DoesOriginMatchIsolatedOrigin(origin,
+    for (const url::Origin& isolated_origin : it->second) {
-                                                          isolated_origin)) {
+      if (IsolatedOriginUtil::DoesOriginMatchIsolatedOrigin(origin,
-      if (!found || result->host().length() < isolated_origin.host().length()) {
+                                                            isolated_origin)) {
-        *result = isolated_origin;
+        if (!found ||
-        found = true;
+            result->host().length() < isolated_origin.host().length()) {
+          *result = isolated_origin;
+          found = true;
+        }
      }
    }
  }
@@ -1268,8 +1309,11 @@ bool ChildProcessSecurityPolicyImpl::GetMatchingIsolatedOrigin(
 void ChildProcessSecurityPolicyImpl::RemoveIsolatedOriginForTesting(
    const url::Origin& origin) {
+  GURL key(SiteInstanceImpl::GetSiteForOrigin(origin));
  base::AutoLock lock(lock_);
-  isolated_origins_.erase(origin);
+  isolated_origins_[key].erase(origin);
+  if (isolated_origins_[key].empty())
+    isolated_origins_.erase(key);
 }
 }  // namespace content
--- a/content/browser/child_process_security_policy_impl.h
+++ b/content/browser/child_process_security_policy_impl.h
@@ -12,6 +12,7 @@
 #include <vector>
 #include "base/compiler_specific.h"
+#include "base/containers/flat_map.h"
 #include "base/containers/flat_set.h"
 #include "base/gtest_prod_util.h"
 #include "base/macros.h"
@@ -97,6 +98,15 @@ class CONTENT_EXPORT ChildProcessSecurityPolicyImpl
  bool GetMatchingIsolatedOrigin(const url::Origin& origin,
                                 url::Origin* result) override;
+  // A version of GetMatchingIsolatedOrigin that takes in both the |origin| and
+  // the |site_url| that |origin| corresponds to.  |site_url| is the key by
+  // which |origin| will be looked up in |isolated_origins_|; this function
+  // allows it to be passed in when it is already known to avoid recomputing it
+  // internally.
+  bool GetMatchingIsolatedOrigin(const url::Origin& origin,
+                                 const GURL& site_url,
+                                 url::Origin* result);
  // Returns if |child_id| can read all of the |files|.
  bool CanReadAllFiles(int child_id, const std::vector<base::FilePath>& files);
@@ -361,7 +371,17 @@ class CONTENT_EXPORT ChildProcessSecurityPolicyImpl
  // when making process model decisions, rather than the origin's scheme and
  // eTLD+1. Each of these origins requires a dedicated process.  This set is
  // protected by |lock_|.
-  base::flat_set<url::Origin> isolated_origins_;
+  //
+  // The origins are stored in a map indexed by a site URL computed for each
+  // origin.  For example, adding https://foo.com, https://bar.foo.com, and
+  // https://www.bar.com would result in the following structure:
+  //   https://foo.com -> { https://foo.com, https://bar.foo.com }
+  //   https://bar.com -> { https://www.bar.com }
+  // This organization speeds up lookups of isolated origins. The site can be
+  // found in O(log n) time, and the corresponding list of origins to search
+  // using the expensive DoesOriginMatchIsolatedOrigin() comparison is
+  // typically small.
+  base::flat_map<GURL, base::flat_set<url::Origin>> isolated_origins_;
  DISALLOW_COPY_AND_ASSIGN(ChildProcessSecurityPolicyImpl);
 };

--- a/content/browser/child_process_security_policy_unittest.cc
+++ b/content/browser/child_process_security_policy_unittest.cc
@@ -9,6 +9,7 @@
 #include "base/logging.h"
 #include "base/test/mock_log.h"
 #include "content/browser/child_process_security_policy_impl.h"
+#include "content/browser/site_instance_impl.h"
 #include "content/public/common/bindings_policy.h"
 #include "content/public/common/url_constants.h"
 #include "content/test/test_content_browser_client.h"
@@ -1086,12 +1087,32 @@ TEST_F(ChildProcessSecurityPolicyTest, OriginGranting) {
  p->Remove(kRendererID);
 }
+namespace {
+// Helpers to construct (key, value) entries used to validate the
+// isolated_origins_ map.
+auto IsolatedOriginEntry(const url::Origin& origin) {
+  return std::pair<GURL, base::flat_set<url::Origin>>(
+      SiteInstanceImpl::GetSiteForOrigin(origin), {origin});
+}
+auto IsolatedOriginEntry(const url::Origin& origin1,
+                         const url::Origin& origin2) {
+  EXPECT_EQ(SiteInstanceImpl::GetSiteForOrigin(origin1),
+            SiteInstanceImpl::GetSiteForOrigin(origin2));
+  return std::pair<GURL, base::flat_set<url::Origin>>(
+      SiteInstanceImpl::GetSiteForOrigin(origin1), {origin1, origin2});
+}
+}  // namespace
 // Verifies ChildProcessSecurityPolicyImpl::AddIsolatedOrigins method.
 TEST_F(ChildProcessSecurityPolicyTest, AddIsolatedOrigins) {
  url::Origin foo = url::Origin::Create(GURL("https://foo.com/"));
  url::Origin bar = url::Origin::Create(GURL("https://bar.com/"));
  url::Origin baz = url::Origin::Create(GURL("https://baz.com/"));
-  url::Origin foobar = url::Origin::Create(GURL("https://foobar.com/"));
+  url::Origin quxfoo = url::Origin::Create(GURL("https://qux.foo.com/"));
  url::Origin baz_http_8000 = url::Origin::Create(GURL("http://baz.com:8000/"));
  url::Origin baz_https_8000 =
      url::Origin::Create(GURL("https://baz.com:8000/"));
@@ -1104,28 +1125,39 @@ TEST_F(ChildProcessSecurityPolicyTest, AddIsolatedOrigins) {
  // Verify deduplication of the argument.
  p->AddIsolatedOrigins({foo, bar, bar});
-  EXPECT_THAT(p->isolated_origins_, testing::UnorderedElementsAre(foo, bar));
+  EXPECT_THAT(p->isolated_origins_,
+              testing::UnorderedElementsAre(IsolatedOriginEntry(foo),
+                                            IsolatedOriginEntry(bar)));
  // Verify that the old set is extended (not replaced).
  p->AddIsolatedOrigins({baz});
  EXPECT_THAT(p->isolated_origins_,
-              testing::UnorderedElementsAre(foo, bar, baz));
+              testing::UnorderedElementsAre(IsolatedOriginEntry(foo),
+                                            IsolatedOriginEntry(bar),
+                                            IsolatedOriginEntry(baz)));
  // Verify deduplication against the old set.
  p->AddIsolatedOrigins({foo});
  EXPECT_THAT(p->isolated_origins_,
-              testing::UnorderedElementsAre(foo, bar, baz));
+              testing::UnorderedElementsAre(IsolatedOriginEntry(foo),
+                                            IsolatedOriginEntry(bar),
+                                            IsolatedOriginEntry(baz)));
-  // Verify deduplication considers scheme and port differences.
+  // Verify deduplication considers scheme and port differences.  Note that
+  // origins that differ only in ports map to the same key.
  p->AddIsolatedOrigins({baz, baz_http_8000, baz_https_8000});
  EXPECT_THAT(p->isolated_origins_,
-              testing::UnorderedElementsAre(foo, bar, baz, baz_http_8000,
+              testing::UnorderedElementsAre(
-                                            baz_https_8000));
+                  IsolatedOriginEntry(foo), IsolatedOriginEntry(bar),
+                  IsolatedOriginEntry(baz, baz_https_8000),
+                  IsolatedOriginEntry(baz_http_8000)));
  // Verify that adding an origin that is invalid for isolation will 1) log a
  // warning and 2) won't CHECK or crash the browser process, 3) will not add
  // the invalid origin, but will add the remaining origins passed to
-  // AddIsolatedOrigins.
+  // AddIsolatedOrigins.  Note that the new |quxfoo| origin should map to the
+  // same key (i.e., the https://foo.com/ site URL) as the existing |foo|
+  // origin.
  {
    base::test::MockLog mock_log;
    EXPECT_CALL(mock_log,
@@ -1134,10 +1166,12 @@ TEST_F(ChildProcessSecurityPolicyTest, AddIsolatedOrigins) {
        .Times(1);
    mock_log.StartCapturingLogs();
-    p->AddIsolatedOrigins({foobar, invalid_etld});
+    p->AddIsolatedOrigins({quxfoo, invalid_etld});
    EXPECT_THAT(p->isolated_origins_,
-                testing::UnorderedElementsAre(foo, bar, baz, baz_http_8000,
+                testing::UnorderedElementsAre(
-                                              baz_https_8000, foobar));
+                    IsolatedOriginEntry(foo, quxfoo), IsolatedOriginEntry(bar),
+                    IsolatedOriginEntry(baz, baz_https_8000),
+                    IsolatedOriginEntry(baz_http_8000)));
  }
 }

--- a/content/browser/site_instance_impl.cc
+++ b/content/browser/site_instance_impl.cc
@@ -401,7 +401,11 @@ bool SiteInstanceImpl::IsSameWebSite(BrowserContext* browser_context,
  }
  // If the sites are the same, check isolated origins.  If either URL matches
-  // an isolated origin, compare origins rather than sites.
+  // an isolated origin, compare origins rather than sites.  As an optimization
+  // to avoid unneeded isolated origin lookups, shortcut this check if the two
+  // origins are the same.
+  if (src_origin == dest_origin)
+    return true;
  auto* policy = ChildProcessSecurityPolicyImpl::GetInstance();
  url::Origin src_isolated_origin;
  url::Origin dest_isolated_origin;
@@ -449,27 +453,21 @@ GURL SiteInstanceImpl::GetSiteForURL(BrowserContext* browser_context,
                 : real_url;
  url::Origin origin = url::Origin::Create(url);
-  // Isolated origins should use the full origin as their site URL. A subdomain
-  // of an isolated origin should also use that isolated origin's site URL. It
-  // is important to check |url| rather than |real_url| here, since some
-  // effective URLs (such as for NTP) need to be resolved prior to the isolated
-  // origin lookup.
-  auto* policy = ChildProcessSecurityPolicyImpl::GetInstance();
-  url::Origin isolated_origin;
-  if (policy->GetMatchingIsolatedOrigin(origin, &isolated_origin))
-    return isolated_origin.GetURL();
  // If the url has a host, then determine the site.  Skip file URLs to avoid a
  // situation where site URL of file://localhost/ would mismatch Blink's origin
  // (which ignores the hostname in this case - see https://crbug.com/776160).
  if (!origin.host().empty() && origin.scheme() != url::kFileScheme) {
-    // Only keep the scheme and registered domain of |origin|.
+    GURL site_url(GetSiteForOrigin(origin));
-    std::string domain = net::registry_controlled_domains::GetDomainAndRegistry(
-        origin.host(),
+    // Isolated origins should use the full origin as their site URL. A
-        net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
+    // subdomain of an isolated origin should also use that isolated origin's
-    std::string site = origin.scheme();
+    // site URL. It is important to check |origin| (based on |url|) rather than
-    site += url::kStandardSchemeSeparator;
+    // |real_url| here, since some effective URLs (such as for NTP) need to be
-    site += domain.empty() ? origin.host() : domain;
+    // resolved prior to the isolated origin lookup.
+    auto* policy = ChildProcessSecurityPolicyImpl::GetInstance();
+    url::Origin isolated_origin;
+    if (policy->GetMatchingIsolatedOrigin(origin, site_url, &isolated_origin))
+      return isolated_origin.GetURL();
    // If an effective URL was used, augment the effective site URL with the
    // underlying web site in the hash.  This is needed to keep
@@ -479,12 +477,16 @@ GURL SiteInstanceImpl::GetSiteForURL(BrowserContext* browser_context,
    // TODO(https://crbug.com/734722): Consider replacing this hack with
    // a proper security principal.
    if (should_use_effective_urls && url != real_url) {
-      site += "#" + GetSiteForURL(browser_context, real_url,
+      std::string non_translated_site_url(
-                                  false /* should_use_effective_urls */)
+          GetSiteForURL(browser_context, real_url,
-                        .spec();
+                        false /* should_use_effective_urls */)
+              .spec());
+      GURL::Replacements replacements;
+      replacements.SetRefStr(non_translated_site_url.c_str());
+      site_url = site_url.ReplaceComponents(replacements);
    }
-    return GURL(site);
+    return site_url;
  }
  // If there is no host but there is a scheme, return the scheme.
@@ -535,6 +537,18 @@ GURL SiteInstanceImpl::GetSiteForURL(BrowserContext* browser_context,
  return GURL();
 }
+// static
+GURL SiteInstanceImpl::GetSiteForOrigin(const url::Origin& origin) {
+  // Only keep the scheme and registered domain of |origin|.
+  std::string domain = net::registry_controlled_domains::GetDomainAndRegistry(
+      origin.host(),
+      net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
+  std::string site = origin.scheme();
+  site += url::kStandardSchemeSeparator;
+  site += domain.empty() ? origin.host() : domain;
+  return GURL(site);
+}
 // static
 GURL SiteInstanceImpl::GetEffectiveURL(BrowserContext* browser_context,
                                       const GURL& url) {

--- a/content/browser/site_instance_impl.h
+++ b/content/browser/site_instance_impl.h
@@ -127,12 +127,19 @@ class CONTENT_EXPORT SiteInstanceImpl final : public SiteInstance,
  // Returns the site for the given URL, which includes only the scheme and
  // registered domain.  Returns an empty GURL if the URL has no host.
-  // |use_effective_urls| specifies whether to resolve |url| to an effective
+  // |should_use_effective_urls| specifies whether to resolve |url| to an
-  // URL (via ContentBrowserClient::GetEffectiveURL()) before determining the
+  // effective URL (via ContentBrowserClient::GetEffectiveURL()) before
-  // site.
+  // determining the site.
  static GURL GetSiteForURL(BrowserContext* context,
                            const GURL& url,
-                            bool use_effective_urls);
+                            bool should_use_effective_urls);
+  // Returns the site of a given |origin|.  Unlike GetSiteForURL(), this does
+  // not utilize effective URLs, isolated origins, or other special logic.  It
+  // only translates an origin into a site (i.e., scheme and eTLD+1) and is
+  // used internally by GetSiteForURL().  For making process model decisions,
+  // GetSiteForURL() should be used instead.
+  static GURL GetSiteForOrigin(const url::Origin& origin);
  // Returns the URL to which a process should be locked for the given URL.
  // This is computed similarly to the site URL (see GetSiteForURL), but