Commit 174d9faa authored by Nico Weber's avatar Nico Weber Committed by Commit Bot

build: In LTO links, run just one link at once, but give it all cores

Normally, we allow running several links in parallel if there's enough
memory, since links tend to be memory- and io-bound, not compute bound.
However, in LTO builds the linker does codegen for every linked
translation unit, which needs a lot of CPU.

Previously, we used to hardcode 8 threads per link for thinlto builds,
which makes builds needlessly slow on machines with many cores.

Instead, automatically use all cores for a thinlto link, but only run
one link in parallel at once.

chromeos-chrome-9999.ebuild in the Chrome OS build and link_jobs_32 in
mb_config.pyl currently manually set the number of threads per lto link
to the number of cores. Once this change here is in, we should stop
explicitly setting the arg there and then we can turn this from a gn arg
to a regular variable that's not settable in args.gn

Bug: 1132930
Change-Id: If2fda3e38b2b1827beb2da844f024f52211210db
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464782Reviewed-by: default avatarHans Wennborg <hans@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Auto-Submit: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#816129}
parent 5d66c672
...@@ -621,7 +621,7 @@ config("compiler") { ...@@ -621,7 +621,7 @@ config("compiler") {
# files on Windows (https://crbug.com/871962). # files on Windows (https://crbug.com/871962).
ldflags += [ ldflags += [
"/opt:lldlto=" + lto_opt_level, "/opt:lldlto=" + lto_opt_level,
"/opt:lldltojobs=" + max_jobs_per_link, "/opt:lldltojobs=all",
# Experimentally determined to yield a reasonable trade-off between # Experimentally determined to yield a reasonable trade-off between
# build time, run-time performance, and binary size. # build time, run-time performance, and binary size.
...@@ -637,10 +637,11 @@ config("compiler") { ...@@ -637,10 +637,11 @@ config("compiler") {
# usage in crbug.com/1038040. Note this will increase build time in # usage in crbug.com/1038040. Note this will increase build time in
# Chrome OS. # Chrome OS.
# Limit the parallelism to avoid too aggressive competition between # In ThinLTO builds, we run at most one link process at a time,
# linker jobs. This is still suboptimal to a potential dynamic # and let it use all cores.
# resource allocation scheme, but should be good enough. # TODO(thakis): Check if '=0' (that is, number of cores, instead
ldflags += [ "-Wl,--thinlto-jobs=" + max_jobs_per_link ] # of "all" which means number of hardware threads) is faster.
ldflags += [ "-Wl,--thinlto-jobs=all" ]
# Limit the size of the ThinLTO cache to the lesser of 10% of # Limit the size of the ThinLTO cache to the lesser of 10% of
# available disk space, 10GB and 100000 files. # available disk space, 10GB and 100000 files.
......
...@@ -73,8 +73,8 @@ declare_args() { ...@@ -73,8 +73,8 @@ declare_args() {
# If true, use Goma for ThinLTO code generation where applicable. # If true, use Goma for ThinLTO code generation where applicable.
use_goma_thin_lto = false use_goma_thin_lto = false
# Limit the number of jobs (threads/processes) the linker is allowed # Ignored, only exists temporarily for backwards compat.
# to use (for linkers that support this). # TODO(thakis): Remove this once bots no longer set it.
max_jobs_per_link = 8 max_jobs_per_link = 8
# Whether we're using a sample profile collected on an architecture different # Whether we're using a sample profile collected on an architecture different
......
...@@ -76,16 +76,24 @@ if (concurrent_links == -1) { ...@@ -76,16 +76,24 @@ if (concurrent_links == -1) {
# so that we can compute better values. # so that we can compute better values.
_command_dict = exec_script("get_concurrent_links.py", _args, "scope") _command_dict = exec_script("get_concurrent_links.py", _args, "scope")
concurrent_links = _command_dict.primary_pool_size if (use_thin_lto) {
concurrent_links_logs = _command_dict.explanation concurrent_links = 1
if (_command_dict.secondary_pool_size >= concurrent_links) { concurrent_links_logs =
# Have R8 / Lint share the link pool unless we would safely get more [ "thinlto -- only one link at once but every link can use all cores" ]
# concurrency out of using a separate one. } else {
# On low-RAM machines, this allows an apk's native library to link at the concurrent_links = _command_dict.primary_pool_size
# same time as its java is optimized with R8. concurrent_links_logs = _command_dict.explanation
java_cmd_pool_size = _command_dict.secondary_pool_size
if (_command_dict.secondary_pool_size >= concurrent_links) {
# Have R8 / Lint share the link pool unless we would safely get more
# concurrency out of using a separate one.
# On low-RAM machines, this allows an apk's native library to link at the
# same time as its java is optimized with R8.
java_cmd_pool_size = _command_dict.secondary_pool_size
}
} }
} else { } else {
assert(!use_thin_lto, "can't explicitly set concurrent_links with thinlto")
concurrent_links_logs = concurrent_links_logs =
[ "concurrent_links set by GN arg (value=$concurrent_links)" ] [ "concurrent_links set by GN arg (value=$concurrent_links)" ]
} }
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment