Reland "build: In LTO links, run just one link at once, but give it all cores"

This is a reland of 174d9faa For the reland, tolerate explicitly set number of concurrent links in thinlto builds as long as it's explicitly set to 1. This is needed for some cros bots. Original change's description: > build: In LTO links, run just one link at once, but give it all cores > > Normally, we allow running several links in parallel if there's enough > memory, since links tend to be memory- and io-bound, not compute bound. > However, in LTO builds the linker does codegen for every linked > translation unit, which needs a lot of CPU. > > Previously, we used to hardcode 8 threads per link for thinlto builds, > which makes builds needlessly slow on machines with many cores. > > Instead, automatically use all cores for a thinlto link, but only run > one link in parallel at once. > > chromeos-chrome-9999.ebuild in the Chrome OS build and link_jobs_32 in > mb_config.pyl currently manually set the number of threads per lto link > to the number of cores. Once this change here is in, we should stop > explicitly setting the arg there and then we can turn this from a gn arg > to a regular variable that's not settable in args.gn > > Bug: 1132930 > Change-Id: If2fda3e38b2b1827beb2da844f024f52211210db > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464782 > Reviewed-by: Hans Wennborg <hans@chromium.org> > Commit-Queue: Nico Weber <thakis@chromium.org> > Auto-Submit: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#816129} Bug: 1132930 Change-Id: I40840f971e2a87afe33e77f537776ba78cf027a6 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464913Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#816267}

Reland "build: In LTO links, run just one link at once, but give it all cores"
This is a reland of 174d9faa For the reland, tolerate explicitly set number of concurrent links in thinlto builds as long as it's explicitly set to 1. This is needed for some cros bots. Original change's description: > build: In LTO links, run just one link at once, but give it all cores > > Normally, we allow running several links in parallel if there's enough > memory, since links tend to be memory- and io-bound, not compute bound. > However, in LTO builds the linker does codegen for every linked > translation unit, which needs a lot of CPU. > > Previously, we used to hardcode 8 threads per link for thinlto builds, > which makes builds needlessly slow on machines with many cores. > > Instead, automatically use all cores for a thinlto link, but only run > one link in parallel at once. > > chromeos-chrome-9999.ebuild in the Chrome OS build and link_jobs_32 in > mb_config.pyl currently manually set the number of threads per lto link > to the number of cores. Once this change here is in, we should stop > explicitly setting the arg there and then we can turn this from a gn arg > to a regular variable that's not settable in args.gn > > Bug: 1132930 > Change-Id: If2fda3e38b2b1827beb2da844f024f52211210db > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464782 > Reviewed-by: Hans Wennborg <hans@chromium.org> > Commit-Queue: Nico Weber <thakis@chromium.org> > Auto-Submit: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#816129} Bug: 1132930 Change-Id: I40840f971e2a87afe33e77f537776ba78cf027a6 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464913Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#816267}
a2ebd151 · Nico Weber · Commit Bot · 6d6dde7e · a2ebd151 · a2ebd151
Commit a2ebd151 authored Oct 12, 2020 by Nico Weber Committed by Commit Bot Oct 12, 2020
3 changed files
--- a/build/config/compiler/BUILD.gn
+++ b/build/config/compiler/BUILD.gn
@@ -621,7 +621,7 @@ config("compiler") {
      # files on Windows (https://crbug.com/871962).
      ldflags += [
        "/opt:lldlto=" + lto_opt_level,
-        "/opt:lldltojobs=" + max_jobs_per_link,
+        "/opt:lldltojobs=all",
        # Experimentally determined to yield a reasonable trade-off between
        # build time, run-time performance, and binary size.
@@ -637,10 +637,11 @@ config("compiler") {
      # usage in crbug.com/1038040. Note this will increase build time in
      # Chrome OS.
-      # Limit the parallelism to avoid too aggressive competition between
+      # In ThinLTO builds, we run at most one link process at a time,
-      # linker jobs. This is still suboptimal to a potential dynamic
+      # and let it use all cores.
-      # resource allocation scheme, but should be good enough.
+      # TODO(thakis): Check if '=0' (that is, number of cores, instead
-      ldflags += [ "-Wl,--thinlto-jobs=" + max_jobs_per_link ]
+      # of "all" which means number of hardware threads) is faster.
+      ldflags += [ "-Wl,--thinlto-jobs=all" ]
      # Limit the size of the ThinLTO cache to the lesser of 10% of
      # available disk space, 10GB and 100000 files.

--- a/build/config/compiler/compiler.gni
+++ b/build/config/compiler/compiler.gni
@@ -73,8 +73,8 @@ declare_args() {
  # If true, use Goma for ThinLTO code generation where applicable.
  use_goma_thin_lto = false
-  # Limit the number of jobs (threads/processes) the linker is allowed
+  # Ignored, only exists temporarily for backwards compat.
-  # to use (for linkers that support this).
+  # TODO(thakis): Remove this once bots no longer set it.
  max_jobs_per_link = 8
  # Whether we're using a sample profile collected on an architecture different

--- a/build/toolchain/concurrent_links.gni
+++ b/build/toolchain/concurrent_links.gni
@@ -76,16 +76,25 @@ if (concurrent_links == -1) {
  # so that we can compute better values.
  _command_dict = exec_script("get_concurrent_links.py", _args, "scope")
-  concurrent_links = _command_dict.primary_pool_size
+  if (use_thin_lto) {
-  concurrent_links_logs = _command_dict.explanation
+    concurrent_links = 1
-  if (_command_dict.secondary_pool_size >= concurrent_links) {
+    concurrent_links_logs =
-    # Have R8 / Lint share the link pool unless we would safely get more
+        [ "thinlto -- only one link at once but every link can use all cores" ]
-    # concurrency out of using a separate one.
+  } else {
-    # On low-RAM machines, this allows an apk's native library to link at the
+    concurrent_links = _command_dict.primary_pool_size
-    # same time as its java is optimized with R8.
+    concurrent_links_logs = _command_dict.explanation
-    java_cmd_pool_size = _command_dict.secondary_pool_size
+    if (_command_dict.secondary_pool_size >= concurrent_links) {
+      # Have R8 / Lint share the link pool unless we would safely get more
+      # concurrency out of using a separate one.
+      # On low-RAM machines, this allows an apk's native library to link at the
+      # same time as its java is optimized with R8.
+      java_cmd_pool_size = _command_dict.secondary_pool_size
+    }
  }
 } else {
+  assert(!use_thin_lto || concurrent_links == 1,
+         "can't explicitly set concurrent_links with thinlto")
  concurrent_links_logs =
      [ "concurrent_links set by GN arg (value=$concurrent_links)" ]
 }