Updated rep perf test documentation

Updates: - Guides added for adding experimental attribute to a story and when to do so. - How to run perf tests locally is added to the document Bug: none Change-Id: Ie2c3f34906d185cd3992e4a614c49f43e4c3ad4f Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2210632Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Commit-Queue: Behdad Bakhshinategh <behdadb@chromium.org> Cr-Commit-Position: refs/heads/master@{#771156}

Updated rep perf test documentation
Updates: - Guides added for adding experimental attribute to a story and when to do so. - How to run perf tests locally is added to the document Bug: none Change-Id: Ie2c3f34906d185cd3992e4a614c49f43e4c3ad4f Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2210632Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Commit-Queue: Behdad Bakhshinategh <behdadb@chromium.org> Cr-Commit-Position: refs/heads/master@{#771156}
02325939 · behdad · Commit Bot · aaa46efe · 02325939
Commit 02325939 authored May 21, 2020 by behdad Committed by Commit Bot May 21, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 59 additions and 20 deletions

docs/testing/rendering_representative_perf_tests.md docs/testing/rendering_representative_perf_tests.md +59 -20

No files found.
--- a/docs/testing/rendering_representative_perf_tests.md
+++ b/docs/testing/rendering_representative_perf_tests.md
 # Representative Performance Tests for Rendering Benchmark
-`rendering_representative_perf_tests` runs a sub set of stories from rendering
+`rendering_representative_perf_tests` run a subset of stories from rendering
 benchmark on CQ, to prevent performance regressions. For each platform there is
 a `story_tag` which describes the representative stories used in this test.
-These stories will be tested using the [`run_benchmark`](../../tools/perf/run_benchmark) script. Then the recorded values for `frame_times` will be
+These stories will be tested using the [`run_benchmark`](../../tools/perf/run_benchmark) script, and then the recorded values for the target metric (currently `frame_times`) will be
-compared with the historical upper limit described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json).
+compared with the historical upper limit described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json].
+These tests are currently running on CQ on:
+* mac-rel
+* win10_chromium_x64_rel_ng
 [TOC]
@@ -18,13 +22,60 @@ Example:`animometer_webgl_attrib_arrays higher average frame_times(21.095) compa
 This means that the animometer_webgl_attrib_arrays story has the average frame_times of 21 ms and the recorded upper limit for the story (in the tested platform) is 17 ms.
 In these cases the failed story will be ran three more times to make sure that this has not been a flake, and the new result (average of three runs) will be reported in the same format.
-For deeper investigation of such cases you can find the traces of the runs in the isolated outputs of the test. In the isolated outputs directory look at output.json for the initial run and at re_run_failures/output.json for the three re-runs.
+For deeper investigation of such cases you can find the traces of the runs in the isolated outputs of the test. In the isolated outputs directory look at `output.json` for the initial run and at `re_run_failures/output.json` for the three traces recorded from re-runs.
+If the failure is as a result of an expected regression, please follow the instructions in the next section for the "Updating Expectations".
 In the `output.json` file, you can find the name of the story and under the trace.html field of the story a gs:// link to the trace ([Example](https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=f8961d773cdf0bf121525f29024c0e6c19d42e61&as=output.json)).
 To download the trace run: `gsutil cp gs://link_from_output.json trace_name.html`
 Also if tests fail with no specific messages in the output, it will be useful to check the {benchmark}/benchmark_log.txt file in the isolated outputs directory for more detailed log of the failure.
+### Running representative perf tests locally
+You can run the representative perf tests locally for more investigation, but it is important to note that the values may differ with the values reported on the bots as these tests can have different values for different hardware.
+```
+./testing/scripts/run_rendering_benchmark_with_gated_performance.py ./tools/perf/run_benchmark \
+--benchmark rendering.desktop --isolated-script-test-output /tmp/temp_dir/ \
+--isolated-script-test-perf-output /tmp/temp_dir
+```
+*rendering.mobile for running mobile representatives*
+For investigation of crashes (or when the comparison of the values is not the focus) it might be easier to directly run the benchmark for the representative story tags such as:
+*   `representative_win_desktop (benchmark: rendering.desktop)`
+*   `representative_mac_desktop (benchmark: rendering.desktop)`
+*   `representative_mobile (benchmark: rendering.mobile)`
+```
+./tools/perf/run_benchmark rendering.desktop --story-tag-filter representative_win_desktop
+```
+## Updating Expectations
+There might be multiple reasons to skip a story in representative perf tests such as:
+*   Tests results are flaky and the story needs to be skipped
+*   Adding a new change with expected regression, so we need to skip the test, so that the change would pass on CQ), and adjust the upper limit later.
+*   We want to add a new story to the representative set and the appropriate upper limit is not known yet
+In these cases the story should not cause a failure but it needs to record the values for later adjustments.
+As a result the preferred method to skip a story of the representative perf test is to mark the specific story as experimental in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] along with a bug referring to the cause of test suppression (flakiness, change with expected regression or experimenting with new representatives). This way the test will be run but the values will not be considered for failing the test.
+To do so find the story under the affected platform in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] and add the "experimental" tag to it.
+```
+"platform": {
+    "story_name": {
+        "ci_095": 0.377,
+        "avg": 31.486,
+        "experimental": true,
+        "_comment": "crbug.com/bug_id"
+    },
+}
+```
+*[Example Cl](https://chromium-review.googlesource.com/c/chromium/src/+/2208294)*
 ## Maintaining Representative Performance Tests
 ### Clustering the Benchmark and Choosing Representatives
@@ -43,27 +94,15 @@ managed by adding and removing story tags above to stories in [rendering benchma
 ### Updating the Upper Limits
 The upper limits for averages and confidence interval (CI) ranges of
-`frame_times` described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json)
+`frame_times` described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] are used to passing or failing a test. These values are the 95 percentile of the past 30 runs of the test on each platform (for both average and CI).
-are used to passing or failing a test. These values are the 95 percentile of
-the past 30 runs of the test on each platform (for both average and CI).
 This helps with catching sudden regressions which results in a value higher
 than the upper limits. But in case of gradual regressions, the upper limits
 may not be useful in not updated frequently. Updating these upper limits also
 helps with adopting to improvements.
-Updating these values can be done by running [`src/tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py`](../../tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py)and committing the changes.
+Updating these values can be done by running [`src/tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py`](../../tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py) and committing the changes.
 The script will create a new JSON file using the values of recent runs in place
-of [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json).
+of [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json].
-### Updating Expectations
-To skip any of the tests, update
-[`src/tools/perf/expectations.config`](../../tools/perf/expectations.config) and
-add the story under rendering benchmark (Examples [1](https://chromium-review.googlesource.com/c/chromium/src/+/2055681), [2](https://chromium-review.googlesource.com/c/chromium/src/+/1901357)).
-This expectations file disables the story on the rendering benchmark, which rendering_representative_perf_tests are part of.
-So please add the a bug for each skipped test and link to `Internals>GPU>Metrics`.
-If the test is part of representative perf tests on Windows or MacOS, this
+[rep_perf_json]: ../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json
-should be done under rendering.desktop benchmark and if it's a test on Android
\ No newline at end of file
-under rendering.mobile.
\ No newline at end of file