Commit 8e92b17e authored by Yuly Novikov's avatar Yuly Novikov Committed by Commit Bot

Update GPU bots documentation.

Bug: 962876
Change-Id: I3d4ff17f242eb1e481bd4dddfc82f8a2914f521d
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2040136
Commit-Queue: Yuly Novikov <ynovikov@chromium.org>
Reviewed-by: default avatarBrian Sheedy <bsheedy@chromium.org>
Reviewed-by: default avatarKenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#739446}
parent 13c39dd2
...@@ -35,7 +35,7 @@ of its jobs with the Swarming parameters: ...@@ -35,7 +35,7 @@ of its jobs with the Swarming parameters:
```json ```json
{ {
"gpu": "10de:1cb3-23.21.13.8816", "gpu": "nvidia-quadro-p400-win10-stable",
"os": "Windows-10", "os": "Windows-10",
"pool": "chromium.tests.gpu" "pool": "chromium.tests.gpu"
} }
...@@ -54,8 +54,9 @@ queries of the bots and see, for example, which GPUs are available. ...@@ -54,8 +54,9 @@ queries of the bots and see, for example, which GPUs are available.
The waterfall bots run tests on a single GPU type in order to make it easier to The waterfall bots run tests on a single GPU type in order to make it easier to
see regressions or flakiness that affect only a certain type of GPU. see regressions or flakiness that affect only a certain type of GPU.
'Mac FYI GPU ASAN Release' is an exception, running both on Intel and AMD GPUs.
The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other The tryservers like `win10_chromium_x64_rel_ng` which include GPU tests, on the other
hand, run tests on more than one GPU type. As of this writing, the Windows hand, run tests on more than one GPU type. As of this writing, the Windows
tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
...@@ -67,12 +68,11 @@ tryserver must almost inherently be working as well. ...@@ -67,12 +68,11 @@ tryserver must almost inherently be working as well.
[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py [chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
There are a few one-off GPU configurations on the waterfall where the tests are There are some GPU configurations on the waterfall backed by only one machine,
run locally on physical hardware, rather than via Swarming. A few examples are: or a very small number of machines in the Swarming pool. A few examples are:
<!-- XXX: update this list --> <!-- XXX: update this list -->
* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20%28AMD%29) * [Mac Pro Release (AMD)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20%28AMD%29)
* [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28Intel%20HD%20630%29)
* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/) * [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/)
There are a couple of reasons to continue to support running tests on a There are a couple of reasons to continue to support running tests on a
...@@ -84,11 +84,12 @@ begin scaling it up. ...@@ -84,11 +84,12 @@ begin scaling it up.
Adding a new test step to the bots requires that the test run via an isolate. Adding a new test step to the bots requires that the test run via an isolate.
Isolates describe both the binary and data dependencies of an executable, and Isolates describe both the binary and data dependencies of an executable, and
are the underpinning of how the Swarming system works. See the [LUCI wiki] for are the underpinning of how the Swarming system works. See the [LUCI] documentation for
background on Isolates and Swarming. background on [Isolates] and [Swarming].
<!-- XXX: broken link --> [LUCI]: https://github.com/luci/luci-py
[LUCI wiki]: https://github.com/luci/luci-py/wiki [Isolates]: https://github.com/luci/luci-py/blob/master/appengine/isolate/doc/README.md
[Swarming]: https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/README.md
### Adding a new isolate ### Adding a new isolate
...@@ -96,17 +97,16 @@ background on Isolates and Swarming. ...@@ -96,17 +97,16 @@ background on Isolates and Swarming.
[`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
[`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
example which invokes a series of scripts which finally launches the example which invokes a series of scripts which finally launches the
browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate]. browser, see `telemetry_gpu_integration_test` in [`chrome/test/BUILD.gn`][chrome/test/BUILD.gn].
2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to 2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
your target. Find a similar target to yours in order to determine the your target. Find a similar target to yours in order to determine the
`type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. `type`. The type is referenced in [`src/tools/mb/mb.py`][mb.py].
[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni [testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni
[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn [gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn
<!-- XXX: broken link --> [chrome/test/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/BUILD.gn
[telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl [mb.py]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb.py
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
At this point you can build and upload your isolate to the isolate server. At this point you can build and upload your isolate to the isolate server.
...@@ -135,20 +135,11 @@ See [Adding new steps to the GPU bots] for details on this process. ...@@ -135,20 +135,11 @@ See [Adding new steps to the GPU bots] for details on this process.
## Relevant files that control the operation of the GPU bots ## Relevant files that control the operation of the GPU bots
In the [tools/build] workspace: In the [`tools/build`][tools/build] workspace:
* [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]:
* builders.pyl in these two directories defines the bots that show up on
the waterfall. If you are adding a new bot, you need to add it to
builders.pyl and use go/bug-a-trooper to request a restart of either
master.chromium.gpu or master.chromium.gpu.fyi.
* Only changes under masters/ require a waterfall restart. All other
changes – for example, to scripts/slave/ in this workspace, or the
Chromium workspace – do not require a master restart (and go live the
minute they are committed).
* `scripts/slave/recipe_modules/chromium_tests/`: * `scripts/slave/recipe_modules/chromium_tests/`:
* <code>[chromium_gpu.py]</code> and * [`chromium_gpu.py`][chromium_gpu.py] and
<code>[chromium_gpu_fyi.py]</code> define the following for [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] define the following for
each builder and tester: each builder and tester:
* How the workspace is checked out (e.g., this is where top-of-tree * How the workspace is checked out (e.g., this is where top-of-tree
ANGLE is specified) ANGLE is specified)
...@@ -158,8 +149,8 @@ In the [tools/build] workspace: ...@@ -158,8 +149,8 @@ In the [tools/build] workspace:
video codecs, and enabling compilation of certain tests, like the video codecs, and enabling compilation of certain tests, like the
dEQP tests, that can't be built on all of the Chromium builders) dEQP tests, that can't be built on all of the Chromium builders)
* Note that the GN configuration of the bots is also controlled by * Note that the GN configuration of the bots is also controlled by
<code>[mb_config.pyl]</code> in the Chromium workspace; see below. [`mb_config.pyl`][mb_config.pyl] in the Chromium workspace; see below.
* <code>[trybots.py]</code> defines how try bots *mirror* one or more * [`trybots.py`][trybots.py] defines how try bots *mirror* one or more
waterfall bots. waterfall bots.
* The concept of try bots mirroring waterfall bots ensures there are * The concept of try bots mirroring waterfall bots ensures there are
no differences in behavior between the waterfall bots and the try no differences in behavior between the waterfall bots and the try
...@@ -167,67 +158,107 @@ In the [tools/build] workspace: ...@@ -167,67 +158,107 @@ In the [tools/build] workspace:
and then break on the waterfall. and then break on the waterfall.
* This file defines the behavior of the following GPU-related try * This file defines the behavior of the following GPU-related try
bots: bots:
* `linux-rel`, `mac-rel`, and `win7-rel`, which run against every * `linux-rel`, `mac-rel`, `win10_chromium_x64_rel_ng` and
`android-marshmallow-arm64-rel`, which run against every
Chromium CL, and which mirror the behavior of bots on the Chromium CL, and which mirror the behavior of bots on the
chromium.gpu waterfall. chromium.gpu waterfall.
* The ANGLE try bots, which run against ANGLE CLs, and mirror the * The ANGLE try bots, which run against ANGLE CLs, and mirror the
behavior of the chromium.gpu.fyi waterfall (including using behavior of the chromium.gpu.fyi waterfall (including using
top-of-tree ANGLE, and running additional tests not run by the top-of-tree ANGLE, and running additional tests not run by the
regular Chromium try bots) regular Chromium try bots)
* The optional GPU try servers `linux_optional_gpu_tests_rel`, * The optional GPU try servers `linux_optional_gpu_tests_rel`,
`mac_optional_gpu_tests_rel` and `mac_optional_gpu_tests_rel`, `win_optional_gpu_tests_rel` and
`win_optional_gpu_tests_rel`, which are triggered manually and `android_optional_gpu_tests_rel`, which are added automatically
run some tests which can't be run on the regular Chromium try to CLs which modify a selected set of subdirectories and
servers mainly due to lack of hardware capacity. run some tests which can't be run on the regular Chromium try
servers mainly due to lack of hardware capacity.
[tools/build]: https://chromium.googlesource.com/chromium/tools/build/ * Manual GPU trybots, starting with `gpu-try-` and `gpu-fyi-try-`
[masters/master.chromium.gpu]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/ prefixes, which can be added manually to CLs targeting a
[masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/ specific hardware configuration.
[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py
[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py [tools/build]: https://chromium.googlesource.com/chromium/tools/build/
[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py [chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py
[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
In the [chromium/src] workspace: [trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py
* [src/testing/buildbot]: In the [`chromium/src`][chromium/src] workspace:
* <code>[chromium.gpu.json]</code> and
<code>[chromium.gpu.fyi.json]</code> define which steps are run on * [`src/testing/buildbot`][src/testing/buildbot]:
which bots. These files are autogenerated. Don't modify them directly! * [`chromium.gpu.json`][chromium.gpu.json] and
* <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN [`chromium.gpu.fyi.json`][chromium.gpu.fyi.json] define which steps are
build. run on which bots. These files are autogenerated. Don't modify them
directly!
* [`waterfalls.pyl`][waterfalls.pyl],
[`test_suites.pyl`][test_suites.pyl], [`mixins.pyl`][mixins.pyl] and
[`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] define the
confugation for the autogenerated json files above.
Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
generate the json files after you modify these pyl files.
* [`generate_buildbot_json.py`][generate_buildbot_json.py]
* The generator script for all the waterfalls, including
`chromium.gpu.json` and `chromium.gpu.fyi.json`.
* See the [README for generate_buildbot_json.py] for documentation
on this script and the descriptions of the waterfalls and test
suites.
* When modifying this script, don't forget to also run it, to
regenerate the JSON files. Don't worry; the presubmit step will
catch this if you forget.
* See [Adding new steps to the GPU bots] for more details.
* [`gn_isolate_map.pyl`][gn_isolate_map.pyl] defines all of the isolates'
behavior in the GN build.
* [`src/tools/mb/mb_config.pyl`][mb_config.pyl] * [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
* Defines the GN arguments for all of the bots. * Defines the GN arguments for all of the bots.
* [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] * [`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl]
* The generator script for all the waterfalls, including `chromium.gpu.json` and * A new version of [`mb_config.pyl`][mb_config.pyl] that should supersede
`chromium.gpu.fyi.json`. It defines on which GPUs various tests run. it.
* See the [README for generate_buildbot_json.py] for documentation * [`src/infra/config`][src/infra/config]:
on this script and the descriptions of the waterfalls and test suites. * Definitions of how bots are organized on the waterfall,
* When modifying this script, don't forget to also run it, to regenerate how builds are triggered, which VMs or machines are used for the
the JSON files. Don't worry; the presubmit step will catch this if you forget. builder itself, i.e. for compilation and scheduling swarmed tasks
* See [Adding new steps to the GPU bots] for more details. on GPU hardware. See
[README.md](https://chromium.googlesource.com/chromium/src/+/master/infra/config/README.md)
[chromium/src]: https://chromium.googlesource.com/chromium/src/ in this directory for up to date information.
[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json [chromium/src]: https://chromium.googlesource.com/chromium/src/
[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json [src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl [src/infra/config]: https://chromium.googlesource.com/chromium/src/+/master/infra/config
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl [chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json
[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py [chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json
[mixins.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/mixins.pyl [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
[waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/waterfalls.pyl [mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
[mb_config_buckets.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config_buckets.pyl
[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py
[mixins.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/mixins.pyl
[waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/waterfalls.pyl
[test_suites.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/test_suites.pyl
[test_suite_exceptions.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/test_suite_exceptions.pyl
[README for generate_buildbot_json.py]: ../../testing/buildbot/README.md [README for generate_buildbot_json.py]: ../../testing/buildbot/README.md
In the [infradata/config] workspace (Google internal only, sorry): In the [`infradata/config`][infradata/config] workspace (Google internal only,
sorry):
* [gpu.star] * [`gpu.star`][gpu.star]
* Defines a `chromium.tests.gpu` Swarming pool which contains most of the * Defines a `chromium.tests.gpu` Swarming pool which contains all of the
specialized hardware: as of this writing, the Windows and Linux NVIDIA specialized hardware, except some hardware shared with Chromium:
for example, the Windows and Linux NVIDIA
bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
GPUs. New GPU hardware should be added to this pool. GPUs. New GPU hardware should be added to this pool.
* Also defines the GCEs, Mac VMs and Mac machines used for CI builders
on GPU and GPU.FYI waterfalls and trybots.
* [`chromium.star`][chromium.star]
* Defines Swarming pools of GCEs, shared with Chromium, which are used
for CI builders on GPU and GPU.FYI waterfalls and trybots.
* [`pools.cfg`][pools.cfg]
* Defines the Swarming pools for GCEs and Mac VMs used for manually
triggered trybots.
* [`bot_config.py`][bot_config.py]
* Defines the stable GPU driver and OS versions in GPU Swarming pools.
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config [infradata/config]: https://chrome-internal.googlesource.com/infradata/config
[bot_config.py]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/scripts/bot_config.py
[gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/gpu.star [gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/gpu.star
[chromium.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/chromium.star
[pools.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/pools.cfg
[bot_config.py]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/scripts/bot_config.py
[main.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/main.star [main.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/main.star
[vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-provider/vms.cfg [vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-provider/vms.cfg
...@@ -240,62 +271,91 @@ maintaining the GPU bots, and how they'd be addressed. ...@@ -240,62 +271,91 @@ maintaining the GPU bots, and how they'd be addressed.
This is described in [Adding new tests to the GPU bots]. This is described in [Adding new tests to the GPU bots].
[Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots [Adding new tests to the GPU bots]: https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing.md#Adding-New-Tests-to-the-GPU-Bots
### How to set up new virtual machine instances ### How to set up new virtual machine instances
The tests use virtual machines to build binaries and to trigger tests on The tests use virtual machines to build binaries and to trigger tests on
physical hardware. VMs don't run any tests themselves. Nevertheless the OS physical hardware. VMs don't run any tests themselves. There are 3 types of
of the VM must match the OS of the physical hardware. Android uses Linux VMs bots:
for the hosts.
* Builders - these bots build test binaries, upload them to storage and trigger
1. If you need a Mac VM: tester bots (see below). Builds must be done on the same OS on which the
tests will run, except for Android tests, which are built on Linux.
1. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines * Testers - these bots trigger tests to execute in Swarming and merge results
for the testers. See this [example ticket](http://crbug.com/838975). from multiple shards. 2-core Linux GCEs are sufficient for this task.
1. Follow the instructions below to add an association between those VM * Builder/testers - these are the combination of the above and have same OS
names and the bot names you're adding to [`gpu.star`][gpu.star] and constraints as builders. All trybots are of this type, while for CI bots
regenerate the auto-generated files. it is optional.
1. If you need a non-Mac VM, VMs are allocated using the GCE Provider APIs: The process is:
1. Create a CL in the [`infradata/config`][infradata/config] (Google 1. Follow [go/request-chrome-resources](go/request-chrome-resources) to get
internal) workspace which does the following. Git configure your approval for the VMs. Use `GPU` project resource group.
user.email to @google.com if necessary. For reference, see these example See this [example ticket](http://crbug.com/1012805).
CLs: You'll need to determine how many VMs are required, which OSes, how many
cores and in which swarming pools they will be (see below for different
1. [Adding both Linux and Windows scenarios).
VMs](https://chrome-internal-review.googlesource.com/1068669) for * If setting up a new GPU hardware pool, some VMs will also be needed
trybots. for manual trybots, usually 2 VMs as of this writing.
1. [Adding a Linux * Additional action is needed for Mac VMs, the GPU resource owner will
VM](https://chrome-internal-review.googlesource.com/1095060) for assign the bug to Labs to deploy them. See this
a waterfall bot. [example ticket](http://crbug.com/964355).
1. [Adding a Windows 1. Once GCE resource request is approved / Mac VMs are deployed, the VMs need
VM](https://chrome-internal-review.googlesource.com/1111456) for a to be added to the right Swarming pools in a CL in the
waterfall bot. [`infradata/config`][infradata/config] (Google internal) workspace.
1. GCEs for Windows CI builders and builder/testers should be added to
1. Edit [gpu.star] to add an entry for the new bot. Currently, the only way `luci-chromium-ci-win10-8` group in [`chromium.star`][chromium.star].
to limit the number of concurrent builds per bot is to limit the number [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2077803).
of VMs associated with it. This means that each new bot requires a new 1. GCEs for Linux and Android CI builders and builder/testers should be added to
prefix. Add your new entry to the correct block: one of `luci-chromium-ci-xenial-*-8` groups (but not `*ssd-8`) in
1. Put waterfall bots under `gpu_ci_bots`. For example: <br> [`chromium.star`][chromium.star].
`gce_thin_trusty('linux-fyi-skiarenderer-vulkan-nvidia', 'us-east1-c')` [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2077803).
or <br> `gce_thin_win10('win10-fyi-release-amd-rx-550')`. 1. VMs for Mac CI builders and builder/testers should be added to
1. Put trybots under the appropriate `gpu_try_bots` block (optional GPU `gpu_ci_bots` group in [`gpu.star`][gpu.star].
trybots, ANGLE trybots, etc.). For example: <br> [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1166889).
`gce_trusty_pair('gpu-fyi-try-linux-intel-exp')`. 1. GCEs for CI testers for all OSes should be added to
`luci-chromium-ci-xenial-2` group in [`chromium.star`][chromium.star].
1. Run [main.star] to regenerate `configs/chromium-swarm/bots.cfg` and [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2016410).
'configs/gce-provider/vms.cfg'. Double-check your work there. 1. GCEs and VMs for CQ and optional CQ GPU trybots for should be added to
a corresponding `gpu_try_bots` group in [`gpu.star`][gpu.star].
Note that previously [vms.cfg] had to be editted manually. Part of the [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1561384).
difficulty was in choosing a zone. This should soon no longer be These trybots are "builderful", i.e. these GCEs can't be shared among
necessary per [crbug.com/942301](http://crbug.com/942301), but consult different bots. This is done in order to limit the number of concurrent
with the Chrome Infra team to find out which of the builds on these bots (until [crbug.com/949379](crbug.com/949379) is
[zones](https://cloud.google.com/compute/docs/regions-zones/) has fixed) to prevent oversubscribing GPU hardware.
available capacity. `win_optional_gpu_tests_rel` is an exception, its GCEs come from
1. Get this reviewed and landed. This step associates the VM or pool of VMs `luci-chromium-try-win10-*-8` groups in
with the bot's name on the waterfall. [`chromium.star`][chromium.star], see
[CL](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1708723).
This can cause oversubscription to Windows GPU hardware, however,
Chrome Infra insisted on making this bot builderless due to frequent
interruptions they get from limiting the number of concurrent builds on
it, see discussion in
[CL](https://chromium-review.googlesource.com/c/chromium/src/+/1775098).
1. GCEs and VMs for manual GPU trybots should be added to a corresponding
pool in "Manually-triggered GPU trybots" in [`gpu.star`][gpu.star].
If adding a new pool, it should also be added to
[`pools.cfg`][pools.cfg].
[Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2433332).
This is a different mechanism to limit the load on GPU hardware,
by having a small pool of GCEs which corresponds to some GPU hardware
resource, and all trybots that target this GPU hardware compete for
GCEs from this small pool.
1. Run [`main.star`][main.star] to regenerate
`configs/chromium-swarm/bots.cfg` and `configs/gce-provider/vms.cfg`.
Double-check your work there.
Note that previously [`vms.cfg`][vms.cfg] had to be edited manually.
Part of the difficulty was in choosing a zone. This should soon no
longer be necessary per [crbug.com/942301](http://crbug.com/942301),
but consult with the Chrome Infra team to find out which of the
[zones](https://cloud.google.com/compute/docs/regions-zones/) has
available capacity.
1. Get this reviewed and landed. This step associates the VM or pool of VMs
with the bot's name on the waterfall for "builderful" bots or increases
swarmed pool capacity for "builderless" bots.
Note: CR+1 is not sticky in this repo, so you'll have to ping for
re-review after every change, like rebase.
### How to add a new tester bot to the chromium.gpu.fyi waterfall ### How to add a new tester bot to the chromium.gpu.fyi waterfall
...@@ -326,23 +386,25 @@ Builder]. ...@@ -326,23 +386,25 @@ Builder].
need to be updated for Android bots which don't have PCI buses.) need to be updated for Android bots which don't have PCI buses.)
1. Make sure to add these new machines to the chromium.tests.gpu Swarming 1. Make sure to add these new machines to the chromium.tests.gpu Swarming
pool by creating a CL against [gpu.star] in the [infradata/config] pool by creating a CL against [`gpu.star`][gpu.star] in the
(Google internal) workspace. Git configure your user.email to [`infradata/config`][infradata/config] (Google internal) workspace.
@google.com if necessary. Here is one [example Git configure your user.email to @google.com if necessary. Here is one
CL](https://chrome-internal-review.googlesource.com/913528) and a [example CL](https://chrome-internal-review.googlesource.com/913528)
[second and a
example](https://chrome-internal-review.googlesource.com/1111456). [second example](https://chrome-internal-review.googlesource.com/1111456).
1. Run [main.star] to regenerate `configs/chromium-swarm/bots.cfg`. 1. Run [`main.star`][main.star] to regenerate
Double-check your work there. `configs/chromium-swarm/bots.cfg`. Double-check your work there.
1. Allocate new virtual machines for the bots as described in [How to set up 1. Allocate new virtual machines for the bots as described in [How to set up
new virtual machine new virtual machine
instances](#How-to-set-up-new-virtual-machine-instances). instances](#How-to-set-up-new-virtual-machine-instances).
1. Create a CL in the Chromium workspace which does the following. Here's an 1. Create a CL in the Chromium workspace which does the following. Here's an
[example CL](https://chromium-review.googlesource.com/1041164). [example CL](https://chromium-review.googlesource.com/c/chromium/src/+/1752291).
1. Adds the new machines to [waterfalls.pyl]. 1. Adds the new machines to [`waterfalls.pyl`][waterfalls.pyl] directly or
to [`mixins.pyl`][mixins.pyl], referencing the new mixin in
[`waterfalls.pyl`][waterfalls.pyl].
1. The swarming dimensions are crucial. These must match the GPU and 1. The swarming dimensions are crucial. These must match the GPU and
OS type of the physical hardware in the Swarming pool. This is what OS type of the physical hardware in the Swarming pool. This is what
causes the VMs to spawn their tests on the correct hardware. Make causes the VMs to spawn their tests on the correct hardware. Make
...@@ -360,31 +422,33 @@ Builder]. ...@@ -360,31 +422,33 @@ Builder].
OS description (`Windows-2012ServerR2-SP0`). OS description (`Windows-2012ServerR2-SP0`).
1. If you're deploying a new bot that's similar to another existing 1. If you're deploying a new bot that's similar to another existing
configuration, please search around in configuration, please search around in
`src/testing/buildbot/test_suite_exceptions.pyl` for references to [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] for
the other bot's name and see if your new bot needs to be added to references to the other bot's name and see if your new bot needs
any exclusion lists. For example, some of the tests don't run on to be added to any exclusion lists. For example, some of the tests
certain Win bots because of missing OpenGL extensions. don't run on certain Win bots because of missing OpenGL extensions.
1. Run [generate_buildbot_json.py] to regenerate 1. Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
`src/testing/buildbot/chromium.gpu.fyi.json`. regenerate `src/testing/buildbot/chromium.gpu.fyi.json`.
1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]: 1. Updates [`ci.star`][ci.star] and its related generated files
* Add the two new machines (Release and Debug) inside the [`cr-buildbucket.cfg`][cr-buildbucket.cfg] and
luci.chromium.ci bucket. This sets up storage for the builds in the [`luci-scheduler.cfg`][luci-scheduler.cfg]:
system. Use the appropriate mixin; for example, "win-gpu-fyi-ci" has * Use the appropriate definition for the type of the bot being added,
already been set up for Windows GPU FYI bots on the waterfall. for example, `ci.gpu_fyi_thin_tester()` should be used for all CI
1. Updates [`luci-scheduler.cfg`][luci-scheduler.cfg]: tester bots on GPU FYI waterfall.
* Add new "job" blocks for your new Release and Debug test bots. They * Make sure to set `triggered_by` property to the builder which
should go underneath the builder which triggers them (like "GPU Win triggers the testers (like `'GPU Win FYI Builder'`).
FYI Builder"), in alphabetical order. Make sure the "id" and 1. Updates [`chromium.gpu.star`][chromium.gpu.star] or
"builer" entries match. This job block should use the acl_sets [`chromium.gpu.fyi.star`][chromium.gpu.fyi.star] and their related
"triggered-by-parent-builders", because it's triggered by the generated file [`luci-milo.cfg`][luci-milo.cfg]:
builder, and not by changes to the git repository. * Add new `luci.console_view_entry()` definitions for your new
1. Updates [`luci-milo.cfg`][luci-milo.cfg]: testers (Release and Debug) on the
* Add new "builders" blocks for your new testers (Release and Debug) [`chromium.gpu.fyi`][chromium.gpu.fyi] console. Look at the
on the [`chromium.gpu.fyi`][chromium.gpu.fyi] console. Look at the
short names and categories and try to come up with a reasonable short names and categories and try to come up with a reasonable
organization. organization.
1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
generated files. Double-check your work there.
1. If you were adding a new builder, you would need to also add the new 1. If you were adding a new builder, you would need to also add the new
machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl] and
[`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl].
1. After the Chromium-side CL lands it will take some time for all of 1. After the Chromium-side CL lands it will take some time for all of
the configuration changes to be picked up by the system. The bot the configuration changes to be picked up by the system. The bot
...@@ -396,7 +460,7 @@ Builder]. ...@@ -396,7 +460,7 @@ Builder].
in the [`tools/build`][tools/build] workspace which does the in the [`tools/build`][tools/build] workspace which does the
following. Here's an [example following. Here's an [example
CL](https://chromium-review.googlesource.com/1041145). CL](https://chromium-review.googlesource.com/1041145).
1. Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in 1. Adds the new bot to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
`scripts/slave/recipe_modules/chromium_tests/`. Make sure to set the `scripts/slave/recipe_modules/chromium_tests/`. Make sure to set the
`serialize_tests` property to `True`. This is specified for waterfall `serialize_tests` property to `True`. This is specified for waterfall
bots, but not trybots, and helps avoid overloading the physical bots, but not trybots, and helps avoid overloading the physical
...@@ -406,10 +470,10 @@ Builder]. ...@@ -406,10 +470,10 @@ Builder].
1. Get this reviewed and landed. This step tells the Chromium recipe about 1. Get this reviewed and landed. This step tells the Chromium recipe about
the newly-deployed waterfall bot, so it knows which JSON file to load the newly-deployed waterfall bot, so it knows which JSON file to load
out of src/testing/buildbot and which entry to look at. out of src/testing/buildbot and which entry to look at.
1. It used to be necessary to retrain recipe expectations 1. Sometimes it is necessary to retrain recipe expectations
(`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't (`scripts/slave/recipes.py test train`). This is usually needed only
appear to be necessary any more, but it's something to watch out for if if the bot adds untested code flow in a recipe, but it's something
your CL fails presubmit for some reason. to watch out for if your CL fails presubmit for some reason.
1. Note that it is crucial that the bot be deployed before hooking it up in the 1. Note that it is crucial that the bot be deployed before hooking it up in the
tools/build workspace. In the new LUCI world, if the parent builder can't tools/build workspace. In the new LUCI world, if the parent builder can't
...@@ -417,82 +481,94 @@ Builder]. ...@@ -417,82 +481,94 @@ Builder].
will cause the builders to fail. You can and should prepare the tools/build will cause the builders to fail. You can and should prepare the tools/build
CL in advance, but make sure it doesn't land until the bot's on the console. CL in advance, but make sure it doesn't land until the bot's on the console.
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config/ 1. If the number of physical machines for the new bot permits, you should also
[cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/cr-buildbucket.cfg add a manually-triggered trybot at the same time that the CI bot is added.
[luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/luci-milo.cfg This is described in [How to add a new manually-triggered trybot].
[luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/luci-scheduler.cfg
[GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder [How to add a new manually-triggered trybot]: https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing_bot_details.md#How-to-add-a-new-manually_triggered-trybot
[ci.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/buckets/ci.star
[chromium.gpu.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/chromium.gpu.star
[chromium.gpu.fyi.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/chromium.gpu.fyi.star
[cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/cr-buildbucket.cfg
[luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/luci-scheduler.cfg
[luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/luci-milo.cfg
[GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder
### How to start running tests on a new GPU type on an existing try bot ### How to start running tests on a new GPU type on an existing try bot
Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests Let's say that you want to cause the `win10_chromium_x64_rel_ng` try bot to run
on CoolNewGPUType in addition to the types it currently runs (as of this tests on CoolNewGPUType in addition to the types it currently runs (as of this
writing, NVIDIA and AMD). To do this: writing only NVIDIA). To do this:
1. Make sure there is enough hardware capacity. Unfortunately, tools to report 1. Make sure there is enough hardware capacity using the available tools to
utilization of the Swarming pool are still being developed, but a report utilization of the Swarming pool.
back-of-the-envelope estimate is that you will need a minimum of 30 1. Deploy Release and Debug testers on the `chromium.gpu` waterfall, following
machines in the Swarming pool to run the current set of GPU tests on the the instructions for the `chromium.gpu.fyi` waterfall above. Make sure
tryservers. We estimate that 90 machines will be needed in order to the flakiness on the new bots is comparable to existing `chromium.gpu` bots
additionally run the WebGL 2.0 conformance tests. Plan for the larger before proceeding.
capacity, as it's desired to run the larger test suite on as many 1. Create a CL in the [`tools/build`][tools/build] workspace, adding the new
configurations as possible. Release tester to `win10_chromium_x64_rel_ng`'s `bot_ids` list
2. Deploy Release and Debug testers on the chromium.gpu waterfall, following
the instructions for the chromium.gpu.fyi waterfall above. You will also
need to temporarily add suppressions to
[`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these
new testers since they aren't yet covered by try bots and are going on a
non-FYI waterfall. Make sure these run green for a day or two before
proceeding.
3. Create a CL in the tools/build workspace, adding the new Release tester
to `win_chromium_rel_ng`'s `bot_ids` list
in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun
`scripts/slave/recipes.py --use-bootstrap test train`. `scripts/slave/recipes.py test train`.
4. Once the CL in (3) lands, the commit queue will **immediately** start 1. Once the above CL lands, the commit queue will **immediately** start
running tests on the CoolNewGPUType configuration. Be vigilant and make running tests on the CoolNewGPUType configuration. Be vigilant and make
sure that tryjobs are green. If they are red for any reason, revert the CL sure that tryjobs are green. If they are red for any reason, revert the CL
and figure out offline what went wrong. and figure out offline what went wrong.
[tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py
### How to add a new manually-triggered trybot ### How to add a new manually-triggered trybot
There are a lot of one-off GPU types on the chromium.gpu.fyi waterfall and Manually-triggered trybots are needed for investigating failures on a GPU type
sometimes a failure happens just on one type. It's helpful to just be able to which doesn't have a corresponding CQ trybot (due to lack of GPU resources).
send a tryjob to a particular machine. Doing so requires a specific trybot to be Even for GPU types that have CQ trybots, it is convenient to have
set up because most if not all of the existing trybots trigger tests on more manually-triggered trybots as well, since the CQ trybot often runs on more than
than one type of GPU. one GPU type, or some test suites which run on CI bot can be disabled on CQ
trybot (when the CQ bot mirrors a
[fake bot](https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing_bot_details.md#how-to-add-a-new-try-bot-that-runs-a-subset-of-tests-or-extra-tests)).
Thus, all CI bots in `chromium.gpu` and `chromium.gpu.fyi` have corresponding
manually-triggered trybots, except a few which don't have enough hardware
to support it. A manually-triggered trybot should be added at the same time
a CI bot is added.
Here are the steps to set up a new trybot which runs tests just on one Here are the steps to set up a new trybot which runs tests just on one
particular GPU type. Let's consider that we are adding a manually-triggered particular GPU type. Let's consider that we are adding a manually-triggered
trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot
`gpu_manual_try_win7_nvidia_rel`. `gpu-fyi-try-win7-nvidia-rel-64`.
1. Allocate new virtual machines for the bots as described in [How to set up 1. If there already exist some manually-triggered trybot which runs tests on
new virtual machine the same group of machines (i.e. same GPU, OS and driver), the new trybot
instances](#How-to-set-up-new-virtual-machine-instances), following the will have to share the VMs with it. Otherwise, create a new pool of VMs for
"trybot" instructions. the new hardware and allocate the VMs as described in
[How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances),
following the "Manually-triggered GPU trybots" instructions.
1. Create a CL in the Chromium workspace which does the following. Here's an 1. Create a CL in the Chromium workspace which does the following. Here's an
[example CL](https://chromium-review.googlesource.com/1044767). [outdated example CL](https://chromium-review.googlesource.com/c/chromium/src/+/1974575)
1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]: and a [reference CL](https://chromium-review.googlesource.com/c/chromium/src/+/2015548)
* Add the new trybot to the `luci.chromium.try` bucket. This is a exemplifying the new "GCE pool per GPU hardware pool" way.
one-liner, with "name" being "gpu_manual_try_win7_nvidia_rel" and 1. Updates [`gpu.try.star`][gpu.try.star] and its related generated file
"mixins" being the OS-appropriate mixin, in this case [`cr-buildbucket.cfg`][cr-buildbucket.cfg]:
"win-optional-gpu-try". (We're repurposing the existing ACLs for the * Add the new trybot with the right `builder` define and VMs pool.
"optional" GPU trybots for these manually-triggered ones.) For `gpu-fyi-try-win7-nvidia-rel-64` this would be
1. Updates [`luci-milo.cfg`][luci-milo.cfg]: `gpu_win_builder()` and `luci.chromium.gpu.win7.nvidia.try`.
* Add "builders" blocks for the new trybot to the `luci.chromium.try` and 1. Updates the LUCI consoles you want the trybot to show in and their
`tryserver.chromium.win` consoles. related generated file [`luci-milo.cfg`][luci-milo.cfg]:
1. Adds the new trybot to * For `gpu-fyi-try-win7-nvidia-rel-64` these would be
[`src/tools/mb/mb_config.pyl`][mb_config.pyl]. Reuse the same mixin as [`luci.chromium.try.star`][luci.chromium.try.star] and
for the optional GPU trybot; in this case, [`tryserver.chromium.win.star`][tryserver.chromium.win.star]
`gpu_fyi_tests_release_trybot_x86`. consoles. Just add `try/` followed by trybot name to the lists.
1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
generated files. Double-check your work there.
1. Adds the new trybot to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
and [`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl].
Use the same mixin as does the builder for the CI bot this trybot
mirrors, in case of `gpu-fyi-try-win7-nvidia-rel-64` this is
`GPU FYI Win x64 Builder` and thus `gpu_fyi_tests_release_trybot`.
1. Get this CL reviewed and landed. 1. Get this CL reviewed and landed.
1. Create a CL in the [`tools/build`][tools/build] workspace which does the 1. Create a CL in the [`tools/build`][tools/build] workspace which does the
following. Here's an [example following. Here's an [example
CL](https://chromium-review.googlesource.com/1044761). CL](https://chromium-review.googlesource.com/c/chromium/tools/build/+/1979113).
1. Adds the new trybot to a "Manually-triggered GPU trybots" section in 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in
`scripts/slave/recipe_modules/chromium_tests/trybots.py`. Create this `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Create this
...@@ -500,20 +576,17 @@ trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot ...@@ -500,20 +576,17 @@ trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot
tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`, tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`,
`tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot
mirror the appropriate waterfall bot; in this case, the buildername to mirror the appropriate waterfall bot; in this case, the buildername to
mirror is `GPU FYI Win Builder` and the tester is `Win7 FYI Release mirror is `GPU FYI Win x64 Builder` and the tester is
(NVIDIA)`. `Win7 FYI x64 Release (NVIDIA)`.
1. Adds an exception for your new trybot in `tests/masters_recipes_test.py`,
under `FAKE_BUILDERS`, under the appropriate tryserver waterfall (in
this case, `master.tryserver.chromium.win`). This is because this is a
LUCI-only bot, and this test verifies the old buildbot configurations.
1. Get this reviewed and landed. This step tells the Chromium recipe about 1. Get this reviewed and landed. This step tells the Chromium recipe about
the newly-deployed trybot, so it knows which JSON file to load out of the newly-deployed trybot, so it knows which JSON file to load out of
src/testing/buildbot and which entry to look at to understand which `src/testing/buildbot` and which entry to look at to understand which
tests to run and on what physical hardware. tests to run and on what physical hardware.
1. It used to be necessary to retrain recipe expectations 1. It may be necessary to retrain recipe expectations for
(`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't [`tools/build`][tools/build] workspace CLs
appear to be necessary any more, but it's something to watch out for if (`scripts/slave/recipes.py test train`). This shouldn't be necessary
your CL fails presubmit for some reason. for just adding a manually triggered trybot, but it's something to
watch out for if your CL fails presubmit for some reason.
At this point the new trybot should automatically show up in the At this point the new trybot should automatically show up in the
"Choose tryjobs" pop-up in the Gerrit UI, under the "Choose tryjobs" pop-up in the Gerrit UI, under the
...@@ -524,8 +597,9 @@ should be possible to send a CL to it. ...@@ -524,8 +597,9 @@ should be possible to send a CL to it.
mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the
chrome-infra team if this doesn't work as expected.) chrome-infra team if this doesn't work as expected.)
[chromium/src]: https://chromium-review.googlesource.com/q/project:chromium%252Fsrc+status:open [gpu.try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/buckets/gpu.try.star
[go/chromecals]: http://go/chromecals [luci.chromium.try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/luci.chromium.try.star
[tryserver.chromium.win.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/tryserver.chromium.win.star
### How to add a new try bot that runs a subset of tests or extra tests ### How to add a new try bot that runs a subset of tests or extra tests
...@@ -540,28 +614,37 @@ these try bots which tests to run. ...@@ -540,28 +614,37 @@ these try bots which tests to run.
Let's say that you intended to add a new such custom try bot on Windows. Call it Let's say that you intended to add a new such custom try bot on Windows. Call it
`win-myproject-rel` for example. You will need to add a "fake" mirror bot for `win-myproject-rel` for example. You will need to add a "fake" mirror bot for
each GPU family the tests you will need to run. For a GPU type of each GPU family on which you want to run the tests. For a GPU type of
"CoolNewGPUType" in this example you could add a "fake" bot named "MyProject GPU "CoolNewGPUType" in this example you could add a "fake" bot named "MyProject GPU
Win10 Release (CoolNewGPUType)". Win10 Release (CoolNewGPUType)".
1. Allocate new virtual machines for the bots as described in [How to set up 1. Allocate new virtual machines for the bots as described in
new virtual machine [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances).
instances](#How-to-set-up-new-virtual-machine-instances). 1. Make sure there is enough hardware capacity using the available tools to
1. Make sure that you have some swarming capacity for the new GPU type. Since report utilization of the Swarming pool.
it's not running against all Chromium CLs you don't need the recommended 30
minimum bots, though ~10 would be good.
1. Create a CL in the Chromium workspace the does the following. Here's an 1. Create a CL in the Chromium workspace the does the following. Here's an
[example CL](https://crrev.com/c/1554296). outdated [example CL](https://crrev.com/c/1554296).
1. Add your new bot (for example, "MyProject GPU Win10 Release 1. Add your new bot (for example, "MyProject GPU Win10 Release
(CoolNewGPUType)") to the chromium.gpu.fyi waterfall in (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
[waterfalls.pyl]. [`waterfalls.pyl`][waterfalls.pyl].
1. Re-run [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] to regenerate the JSON files. 1. Add your new bot to
1. Update [`cr-buildbucket.cfg`][cr-buildbucket.cfg] to add `win-myproject-rel`. [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
1. Update [`luci-milo.cfg`][luci-milo.cfg] to include `win-myproject-rel`. in the list of `get_bots_that_do_not_actually_exist` section.
1. Update [`luci-scheduler.cfg`][luci-scheduler.cfg] to include "MyProject GPU Win10 Release 1. Re-run
(CoolNewGPUType)". [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
1. Update [`src/tools/mb/mb_config.pyl`][mb_config.pyl] to include `win-myproject-rel`. to regenerate the JSON files.
1. Also add your fake bot to [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] in the list of `get_bots_that_do_not_actually_exist` section. 1. Update [`scheduler-noop-jobs.star`][scheduler-noop-jobs.star] to
include "MyProject GPU Win10 Release (CoolNewGPUType)".
1. Update [`try.star`][try.star] and desired consoles to include
`win-myproject-rel`.
1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
generated files: [`luci-milo.cfg`][luci-milo.cfg],
[`luci-scheduler.cfg`][luci-scheduler.cfg],
[`cr-buildbucket.cfg`][cr-buildbucket.cfg]. Double-check your work
there.
1. Update [`src/tools/mb/mb_config.pyl`][mb_config.pyl] and
[`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl]
to include `win-myproject-rel`.
1. *After* the Chromium-side CL lands and the bot is on the console, create a CL 1. *After* the Chromium-side CL lands and the bot is on the console, create a CL
in the [`tools/build`][tools/build] workspace which does the in the [`tools/build`][tools/build] workspace which does the
following. Here's an [example CL](https://crrev.com/c/1554272). following. Here's an [example CL](https://crrev.com/c/1554272).
...@@ -574,10 +657,14 @@ Win10 Release (CoolNewGPUType)". ...@@ -574,10 +657,14 @@ Win10 Release (CoolNewGPUType)".
(CoolNewGPUType)" with `win-myproject-rel`. See the sample CL for an example. (CoolNewGPUType)" with `win-myproject-rel`. See the sample CL for an example.
1. Get this reviewed and landed. This step tells the Chromium recipe about 1. Get this reviewed and landed. This step tells the Chromium recipe about
the newly-deployed waterfall bot, so it knows which JSON file to load the newly-deployed waterfall bot, so it knows which JSON file to load
out of src/testing/buildbot and which entry to look at. out of `src/testing/buildbot` and which entry to look at.
1. After your CLs land you should be able to find and run `win-myproject-rel` on CLs 1. After your CLs land you should be able to find and run `win-myproject-rel` on CLs
using Choose Trybots in Gerrit. using Choose Trybots in Gerrit.
[scheduler-noop-jobs.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generators/scheduler-noop-jobs.star
[try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/buckets/try.star
### How to test and deploy a driver and/or OS update ### How to test and deploy a driver and/or OS update
Let's say that you want to roll out an update to the graphics drivers or the OS Let's say that you want to roll out an update to the graphics drivers or the OS
...@@ -589,10 +676,11 @@ or OS update. To do this: ...@@ -589,10 +676,11 @@ or OS update. To do this:
1. Make sure that all of the current Swarming jobs for this OS and GPU 1. Make sure that all of the current Swarming jobs for this OS and GPU
configuration are targeted at the "stable" version of the driver and the OS configuration are targeted at the "stable" version of the driver and the OS
in [waterfalls.pyl] and [mixins.pyl]. Make sure that there are "named" in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
stable versions of the driver and the OS there, which target the Make sure that there are "named" stable versions of the driver and the OS
_TARGETED_DRIVER_VERSIONS and _TARGETED_OS_VERSIONS dictionaries there, which target the `_TARGETED_DRIVER_VERSIONS` and
in [bot_config.py] (Google internal). `_TARGETED_OS_VERSIONS` dictionaries in [`bot_config.py`][bot_config.py]
(Google internal).
1. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of 1. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of
the physical machines already in the Swarming pool upgraded to the new the physical machines already in the Swarming pool upgraded to the new
version of the driver or the OS. version of the driver or the OS.
...@@ -601,13 +689,15 @@ or OS update. To do this: ...@@ -601,13 +689,15 @@ or OS update. To do this:
waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall) waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall)
to deploy one. to deploy one.
1. Have this experimental bot target the new version of the driver or the OS 1. Have this experimental bot target the new version of the driver or the OS
in [waterfalls.pyl] and [mixins.pyl]. [Sample CL][sample driver cl]. in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
[Sample CL][sample driver cl].
1. Hopefully, the new machine will pass the pixel tests. If it doesn't, then 1. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
it'll be necessary to follow the instructions on it'll be necessary to follow the instructions on
[updating Gold baselines (step #4)][updating gold baselines]. [updating Gold baselines (step #4)][updating gold baselines].
1. Watch the new machine for a day or two to make sure it's stable. 1. Watch the new machine for a day or two to make sure it's stable.
1. When it is, update [bot_config.py] (Google internal) to *add* a mapping 1. When it is, update [`bot_config.py`][bot_config.py] (Google internal)
between the new driver version and the "stable" version. For example: to *add* a mapping between the new driver version and the "stable" version.
For example:
``` ```
_TARGETED_DRIVER_VERSIONS = { _TARGETED_DRIVER_VERSIONS = {
...@@ -641,8 +731,8 @@ or OS update. To do this: ...@@ -641,8 +731,8 @@ or OS update. To do this:
pool. pool.
1. If necessary, update pixel test expectations and remove the suppressions 1. If necessary, update pixel test expectations and remove the suppressions
added above. added above.
1. Remove the old driver or OS version from [bot_config.py], leaving the 1. Remove the old driver or OS version from [`bot_config.py`][bot_config.py],
"stable" driver version pointing at the newly upgraded version. leaving the "stable" driver version pointing at the newly upgraded version.
Note that we leave the experimental bot in place. We could reclaim it, but it Note that we leave the experimental bot in place. We could reclaim it, but it
seems worthwhile to continuously test the "next" version of graphics drivers as seems worthwhile to continuously test the "next" version of graphics drivers as
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment