Commit a2ebd151 authored by Nico Weber's avatar Nico Weber Committed by Commit Bot

Reland "build: In LTO links, run just one link at once, but give it all cores"

This is a reland of 174d9faa
For the reland, tolerate explicitly set number of concurrent links
in thinlto builds as long as it's explicitly set to 1. This is needed
for some cros bots.

Original change's description:
> build: In LTO links, run just one link at once, but give it all cores
>
> Normally, we allow running several links in parallel if there's enough
> memory, since links tend to be memory- and io-bound, not compute bound.
> However, in LTO builds the linker does codegen for every linked
> translation unit, which needs a lot of CPU.
>
> Previously, we used to hardcode 8 threads per link for thinlto builds,
> which makes builds needlessly slow on machines with many cores.
>
> Instead, automatically use all cores for a thinlto link, but only run
> one link in parallel at once.
>
> chromeos-chrome-9999.ebuild in the Chrome OS build and link_jobs_32 in
> mb_config.pyl currently manually set the number of threads per lto link
> to the number of cores. Once this change here is in, we should stop
> explicitly setting the arg there and then we can turn this from a gn arg
> to a regular variable that's not settable in args.gn
>
> Bug: 1132930
> Change-Id: If2fda3e38b2b1827beb2da844f024f52211210db
> Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464782
> Reviewed-by: Hans Wennborg <hans@chromium.org>
> Commit-Queue: Nico Weber <thakis@chromium.org>
> Auto-Submit: Nico Weber <thakis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#816129}

Bug: 1132930
Change-Id: I40840f971e2a87afe33e77f537776ba78cf027a6
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464913Reviewed-by: default avatarHans Wennborg <hans@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#816267}
parent 6d6dde7e
...@@ -621,7 +621,7 @@ config("compiler") { ...@@ -621,7 +621,7 @@ config("compiler") {
# files on Windows (https://crbug.com/871962). # files on Windows (https://crbug.com/871962).
ldflags += [ ldflags += [
"/opt:lldlto=" + lto_opt_level, "/opt:lldlto=" + lto_opt_level,
"/opt:lldltojobs=" + max_jobs_per_link, "/opt:lldltojobs=all",
# Experimentally determined to yield a reasonable trade-off between # Experimentally determined to yield a reasonable trade-off between
# build time, run-time performance, and binary size. # build time, run-time performance, and binary size.
...@@ -637,10 +637,11 @@ config("compiler") { ...@@ -637,10 +637,11 @@ config("compiler") {
# usage in crbug.com/1038040. Note this will increase build time in # usage in crbug.com/1038040. Note this will increase build time in
# Chrome OS. # Chrome OS.
# Limit the parallelism to avoid too aggressive competition between # In ThinLTO builds, we run at most one link process at a time,
# linker jobs. This is still suboptimal to a potential dynamic # and let it use all cores.
# resource allocation scheme, but should be good enough. # TODO(thakis): Check if '=0' (that is, number of cores, instead
ldflags += [ "-Wl,--thinlto-jobs=" + max_jobs_per_link ] # of "all" which means number of hardware threads) is faster.
ldflags += [ "-Wl,--thinlto-jobs=all" ]
# Limit the size of the ThinLTO cache to the lesser of 10% of # Limit the size of the ThinLTO cache to the lesser of 10% of
# available disk space, 10GB and 100000 files. # available disk space, 10GB and 100000 files.
......
...@@ -73,8 +73,8 @@ declare_args() { ...@@ -73,8 +73,8 @@ declare_args() {
# If true, use Goma for ThinLTO code generation where applicable. # If true, use Goma for ThinLTO code generation where applicable.
use_goma_thin_lto = false use_goma_thin_lto = false
# Limit the number of jobs (threads/processes) the linker is allowed # Ignored, only exists temporarily for backwards compat.
# to use (for linkers that support this). # TODO(thakis): Remove this once bots no longer set it.
max_jobs_per_link = 8 max_jobs_per_link = 8
# Whether we're using a sample profile collected on an architecture different # Whether we're using a sample profile collected on an architecture different
......
...@@ -76,16 +76,25 @@ if (concurrent_links == -1) { ...@@ -76,16 +76,25 @@ if (concurrent_links == -1) {
# so that we can compute better values. # so that we can compute better values.
_command_dict = exec_script("get_concurrent_links.py", _args, "scope") _command_dict = exec_script("get_concurrent_links.py", _args, "scope")
concurrent_links = _command_dict.primary_pool_size if (use_thin_lto) {
concurrent_links_logs = _command_dict.explanation concurrent_links = 1
if (_command_dict.secondary_pool_size >= concurrent_links) { concurrent_links_logs =
# Have R8 / Lint share the link pool unless we would safely get more [ "thinlto -- only one link at once but every link can use all cores" ]
# concurrency out of using a separate one. } else {
# On low-RAM machines, this allows an apk's native library to link at the concurrent_links = _command_dict.primary_pool_size
# same time as its java is optimized with R8. concurrent_links_logs = _command_dict.explanation
java_cmd_pool_size = _command_dict.secondary_pool_size
if (_command_dict.secondary_pool_size >= concurrent_links) {
# Have R8 / Lint share the link pool unless we would safely get more
# concurrency out of using a separate one.
# On low-RAM machines, this allows an apk's native library to link at the
# same time as its java is optimized with R8.
java_cmd_pool_size = _command_dict.secondary_pool_size
}
} }
} else { } else {
assert(!use_thin_lto || concurrent_links == 1,
"can't explicitly set concurrent_links with thinlto")
concurrent_links_logs = concurrent_links_logs =
[ "concurrent_links set by GN arg (value=$concurrent_links)" ] [ "concurrent_links set by GN arg (value=$concurrent_links)" ]
} }
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment