Update speed docs for Pinpoint.

Bug: catapult:#4159, catapult:#4162 Change-Id: I074cada6d35fa3b0908e8a8d9451e228746db47f Reviewed-on: https://chromium-review.googlesource.com/922701 Commit-Queue: David Tu <dtu@chromium.org> Reviewed-by: Annie Sullivan <sullivan@chromium.org> Reviewed-by: Simon Hatch <simonhatch@chromium.org> Cr-Commit-Position: refs/heads/master@{#537409}

Update speed docs for Pinpoint.
Bug: catapult:#4159, catapult:#4162 Change-Id: I074cada6d35fa3b0908e8a8d9451e228746db47f Reviewed-on: https://chromium-review.googlesource.com/922701 Commit-Queue: David Tu <dtu@chromium.org> Reviewed-by: Annie Sullivan <sullivan@chromium.org> Reviewed-by: Simon Hatch <simonhatch@chromium.org> Cr-Commit-Position: refs/heads/master@{#537409}
c7bc05f5 · Dave Tu · Commit Bot · 7dfb9b6d · c7bc05f5 · c7bc05f5
Commit c7bc05f5 authored Feb 16, 2018 by Dave Tu Committed by Commit Bot Feb 16, 2018
10 changed files
--- a/docs/speed/addressing_performance_regressions.md
+++ b/docs/speed/addressing_performance_regressions.md
@@ -19,111 +19,117 @@ who you can cc on a performance bug if you have questions.

 ## Understanding the bisect results

-The bisect bot spits out a comment on the bug that looks like this:
+### The bug comment
+
+The bisect service spits out a comment on the bug that looks like this:
+
+> **📍 Found significant differences after each of 2 commits.**<br>
+> https://pinpoint-dot-chromeperf.appspot.com/job/148a8d4e840000
+>
+> **Add smcgruer as an animations OWNER** by flackr@chromium.org<br>
+> https://chromium.googlesource.com/chromium/src/+/b091c264862d26ac12d932e84eef7bd5f674e62b
+>
+> **Roll src/third_party/depot_tools/ 0f7b2007a..fd4ad2416 (1 commit)**
+> by depot-tools-roller@chromium.org<br>
+> https://chromium.googlesource.com/chromium/src/+/14fc99e3fd3614096caab7c7a8362edde8327a5d
+> 
+> Understanding performance regressions:<br>
+> &nbsp;&nbsp;http://g.co/ChromePerformanceRegressions
+
+The bug comment gives a summary of that commits that caused improvements or
+regressions. For more details, click the link at the beginning of the comment
+to go to the Pinpoint Job details page.
+
+### The Job details page
+
+Clicking the Pinpoint link in the bug comment brings you to the Job details
+page.
+
+![Pinpoint Job page](images/pinpoint-job-page.png)
+
+Down the left you can see some details about the bisect configuration, including
+the benchmark (`loading.desktop`) and story (`Pantip`) that ran, the bot it ran
+on (`chromium-rel-mac11-pro`), and the metric that was measured
+(`cpuTimeToFirstMeaningfulPaint`). If you're not familiar with the benchmark or
+metric, you can cc the
+[benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+to ask for help.
+
+The graph in the middle of the page shows a summary of the commits that were
+tested across the x-axis and their results on the y-axis. The dots show the
+medians, and the bars show the min and max. These can be used to estimate the
+size of the regression. The units are not available on this page, but are on the
+performance graphs linked on the bug in comment #1.
+
+Click the `+` button in the bottom-right corner of the page to test a patch with
+the current configuration.
+
+### The alerts page
+
+Comment 1 on the bug will have a link to the perf dashboard graphs for the
+regression. (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
+
+![Dashboard Alerts page](images/dashboard-alerts-page.png)
+
+The graphs will give you an idea how widespread the regression is. The `Bot`
+column shows all the different bots the regression occurred on, and the
+`Test` column shows the metrics it regressed on. Often, the same metric
+is gathered on many different web pages. If you see a long list of
+pages, it's likely that the regression affects most pages; if it's
+short, maybe your regression is an edge case. The size of the regressions on
+each bot are also in the table in both relative and absolute terms.

+## Debugging regressions
+
+### How do I run the test?
+
+It's best to [run a perf tryjob](perf_trybots.md), since the machines in the lab
+are set up to match the device and software configs as the perf waterfall,
+making the regression more likely to reproduce. From the Pinpoint Job page,
+clicking the `+` button in the bottom-right corner to test a patch with the
+current configuration.
+
+You can also run locally:
 ```
-=== BISECT JOB RESULTS ===
-Perf regression found with culprit
-
-Suspected Commit
-Author : Your Name
-Commit : 15092e9195954cbc331cd58e344d0895fe03d0cd
-Date : Wed Jun 14 03:09:47 2017
-Subject: Your CL Description.
-
-Bisect Details
-Configuration: mac_pro_perf_bisect
-Benchmark : system_health.common_desktop
-Metric : timeToFirstContentfulPaint_avg/load_search/load_search_taobao
-Change : 15.25% | 1010.02 -> 1164.04
-
-Revision Result N
-chromium@479147 1010.02 +- 1535.41 14 good
-chromium@479209 699.332 +- 1282.01 6 good
-chromium@479240 383.617 +- 917.038 6 good
-chromium@479255 649.186 +- 1896.26 14 good
-chromium@479262 788.828 +- 1897.91 14 good
-chromium@479268 880.727 +- 2235.29 21 good
-chromium@479269 886.511 +- 1150.91 6 good
-chromium@479270 1164.04 +- 979.746 14 bad <--
-
-To Run This Test
-src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.taobao system_health.common_desktop
+src$ tools/perf/run_benchmark benchmark_name --story-filter story_name
 ```

-There's a lot of information packed in that bug comment! Here's a breakdown:
-
-  * **What regressed exactly?** The comment gives you several details:
-    * **The benchmark that regressed**: Under `Bisect Details`, you can see
-      `Benchmark :`. In this case, the `system_health.common_desktop`
-      benchmark regressed.
-    * **What platform did it regress on?** Under `Configuration`, you can find
-      some details on the bot that regressed. In this example, it is a Mac Pro
-      laptop.
-    * **How do I run that locally?** Follow the instructions under
-      `To Run This Test`. But be aware that if it regressed on Android and
-      you're developing on Windows, you may not be able to reproduce locally.
-      (See [Debugging regressions](#Debugging-regressions) below)
-    * **What is this testing?** Generally the metric
-      (`timeToFirstContentfulPaint_avg`) gives some information. If you're not
-      familiar, you can cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
-      to ask for help.
-    * **How severe is this regression?** There are different axes on which to
-      answer that question:
-      * **How much did performance regress?** The bisect bot answers this both
-        in relative terms (`Change : 15.25%`) and absolute terms
-        (`1010.02 -> 1164.04`). To understand the absolute terms, you'll need
-        to look at the units on the performance graphs linked in comment #1
-        of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`).
-        In this example, the units are milliseconds; the time to load taobao
-        regressed from ~1.02 second to 1.16 seconds.
-      * **How widespread is the regression?** The graphs linked in comment #1
-        of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
-        will give you an idea how widespread the regression is. The `Bot`
-        column shows all the different bots the regression occurred on, and the
-        `Test` column shows the metrics it regressed on. Often, the same metric
-        is gathered on many different web pages. If you see a long list of
-        pages, it's likely that the regression affects most pages; if it's
-        short maybe your regression is an edge case.
+### Can I get a trace?

-## Debugging regressions
+For most metrics, yes. Here are the steps:

-  * **How do I run the test locally???** Follow the instructions under
-    `To Run This Test` in the bisect comment. But be aware that regressions
-    are often hardware and/or platform-specific.
-  * **What do I do if I don't have the right hardware to test locally?** If
-    you don't have a local machine that matches the specs of the hardware that
-    regressed, you can run a perf tryjob on the same lab machines that ran the
-    bisect that blamed your CL.
-    [Here are the instructions for perf tryjobs](perf_trybots.md).
-    Drop the `perf_bisect` from the bot name and substitute dashes for
-    underscores to get the trybot name (`mac_pro_perf_bisect` -> `mac_pro`
-    in the example above).
-  * **Can I get a trace?** For most metrics, yes. Here are the steps:
-    1. Click on the `All graphs for this bug` link in comment #1. It should
+1. Click on the `All graphs for this bug` link in comment #1. It should
   look like this:
   `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
-    2. Select a bot/test combo that looks like what the bisect bot originally
+
+2. Select a bot/test combo that looks like what the bisect bot originally
   caught. You might want to look through various regressions for a really
   large increase.
-    3. On the graph, click on the exclamation point icon at the regression, and
+
+3. On the graph, click on the exclamation point icon at the regression, and
   a tooltip comes up. There is a "trace" link in the tooltip, click it to
   open a the trace that was recorded during the performance test.
-  * **Wait, what's a trace?** See the
-    [documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
-    to learn how to use traces to debug performance issues.
-  * **Are there debugging tips specific to certain benchmarks?**
-    * **[Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)**
-    * **[Android binary size](apk_size_regressions.md)**
+
+4. There is also a "Request Debug Trace" button, which kicks off a tryjob with
+   all of the debug trace categories enabled.
+
+### Wait, what's a trace?
+
+See the
+[documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
+to learn how to use traces to debug performance issues.
+
+### Are there debugging tips specific to certain benchmarks?
+
+* [Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)
+* [Android binary size](apk_size_regressions.md)

 ## If you don't believe your CL could be the cause

-*** promo
-Please remember that our performance tests exist to catch unexpected
+> Please remember that our performance tests exist to catch unexpected
 regressions. Often, the tests catch performance problems the CL author was
 not aware of. Please look at the data carefully and understand what the test
 is measuring before concluding that your CL is not related.
-***

 There are some clear reasons to believe the bisect bot made a mistake:


--- a/docs/speed/bisects.md
+++ b/docs/speed/bisects.md
 # Bisecting Performance Regressions

-[TOC]
-
-## What are performance bisects?
-
 The perf tests on chromium's continuous build are very long-running, so we
 cannot run them on every revision. Further, separate repositories like v8
 and skia sometimes roll multiple performance-sensitive changes into chromium
 at once. For these reasons, we need a tool that can bisect the root cause of
 performance regressions over a CL range, descending into third_party
-repositories as necessary. This is what the performance bisect bots do.
+repositories as necessary. The name of the service that does this is called
+[Pinpoint](https://pinpoint-dot-chromeperf.appspot.com/).

-The team is currently working on a new version of performance biscect called
-[pinpoint](https://docs.google.com/document/d/1FKPRNU2kbPJ15p6XHO0itCjYtfvCpGt2IHblriTX1tg/edit)
+[TOC]

 ## Starting a perf bisect

-Performance bisects are tightly integrated with the
+Performance bisects are integrated with the
 [Chrome Performance Dashboard](https://chromeperf.appspot.com/alerts) and
 [monorail](https://bugs.chromium.org/p/chromium/issues/list). Users kick off
 perf bisects on the perf dashboard and view results in monorail.

-You can kick off a perf bisect anywhere you see a performance graph on the perf
-dashboard (except for some tests which don't bisect, because they do not run on
-the [chromium.perf waterfall](https://build.chromium.org/p/chromium.perf/waterfall)).
+You can kick off perf bisect from performance graphs on the perf dashboard for
+any test that runs on the
+[chromium.perf waterfall](https://ci.chromium.org/p/chromium/g/chromium.perf/builders).

 ### To get to a graph, use one of the following methods:

@@ -37,90 +33,43 @@ the [chromium.perf waterfall](https://build.chromium.org/p/chromium.perf/waterfa

 ### To kick off a bisect from the graph:

-![Bisecting on a performance graph](images/bisect_graph.png)
-![The bisect dialog](images/bisect_dialog.png)
-
  1. Click on a data point in the graph.
  2. In the tooltip that shows up, click the `BISECT` button.
  3. Make sure to enter a Bug ID in the dialog that comes up.
-  4. Click the `START BISECT` button.
+  4. Click the `CREATE` button.
+
+![Bisecting on a performance graph](images/bisect_graph.png)
+![The bisect dialog](images/bisect_dialog.png =100)

 ### What are all the boxes in the form?

-  * **Bisect bot**: The name of the configuration in the perf lab to bisect on.
-    This has been prefilled to match the bot that generated the graph as
-    closely as possible.
-  * **Metric**: The metric of the performance test to bisect. This defaults to
-    the metric shown on the graph. It shows a list of other related metrics
-    (for example, if average page load time increased, the drop down will show
-    a list of individual pages which were measured).
-  * **Story filter**: This is a flag specific to
-    [telemetry](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md).
-    It tells telemetry to only run a specific test case, instead of running all
-    the test cases in the suite. This dramatically reduces bisect time for
-    large test suites. The dashboard will prefill this box based on the graph
-    you clicked on. If you suspect that test cases in the benchmark are not
-    independent, you can try bisecting with this box cleared.
  * **Bug ID**: The bug number in monorail. It's very important to fill in
    this field, as this is where bisect results will be posted.
-  * **Earlier revision**: The chromium commit pos to start bisecting from. This
+  * **Start commit**: The chromium commit pos to start bisecting from. This
    is prefilled by the dashboard to the start of the revision range for the
    point you clicked on. You can set it to an earlier commit position to
    bisect a larger range.
-  * **Later revision**: The chromium commit pos to bisect to. This is prefilled
+  * **End commit**: The chromium commit pos to bisect to. This is prefilled
    by the dashboard to the end of the revision range for the point you clicked
    on. You can set it to a later commit pos to bisect a larger range.
-  * **Launch on staging bots**: This is an internal feature, which allows the
-    bisect team to launch a bisect on a test configuration. You likely don't
-    want to check this box unless instructed by the bisect team.
-  * **Bisect mode**: use "mean" to bisect the mean value of the performance
-    test. See below for "return_code".
-
-## Bisecting test failures
-
-The perf bisect bots can also be used to bisect performance test failures.
-See details in [Triaging Data Stoppage Alerts](triaging_data_stoppage_alerts.md).
+  * **Story Filter**: This is a flag specific to
+    [telemetry](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md).
+    It tells telemetry to only run a specific test case, instead of running all
+    the test cases in the suite. This dramatically reduces bisect time for
+    large test suites. The dashboard will prefill this box based on the graph
+    you clicked on. If you suspect that test cases in the benchmark are not
+    independent, you can try bisecting with this box cleared.
+  * **Performance or functional**: use "performance" to bisect on a performance
+    metric, or "functional" to bisect on a test failure or flake.

 ## Interpreting the results

-The bisect bot will output a comment on the bug you input into the dialog when
-bisection is complete. See the
+The bisect bot will output a comment on the bug when the bisection is complete. See
 [Understanding the Bisect Results](addressing_performance_regressions.md#Understanding-the-bisect-results)
-section of the Addressing Performance Regressions doc for details on how to
-interpret the results.
-
-## Getting more debugging data
-
-The bisect outputs some additional data which might be useful for really tough
-regressions or confusing results.
-
-### Traces
-
-Chrome traces are generated by most bisects and uploaded to cloud storage, but
-they're not very visible in the UI. We plan to address this in
-[pinpoint](https://docs.google.com/document/d/1FKPRNU2kbPJ15p6XHO0itCjYtfvCpGt2IHblriTX1tg/edit),
-but in the short term here are the steps to get the traces from a bisect:
-
-  1. The bisect comment should have a "Debug Info" link that looks like this:
-     `https://chromeperf.appspot.com/buildbucket_job_status/8980436717323504240`
-     Click it.
-  2. In the debug info, you should see a "Buildbot link" that looks like this:
-     `https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus7_perf_bisect/builds/4097`
-     Click it.
-  3. There will be several steps on the buildbot status page named "Bisecting
-     Revision". Each has an annotation like "Revision: chromium@474894" so you
-     can tell which revision it ran. Pick the commit position you want the
-     trace from (usually the one at your CL and the one immediately before).
-     Click the arrow by "> Nested step(s) for: Bisecting revision..." on those
-     steps.
-  4. In the nested steps, you'll see several steps titled "Bisecting
-     revision.Performance Test X of Y". These are the actual perf test runs.
-     Click the "stdout" link for one of these steps.
-  5. In the output, do a text search for "View generated trace files online"
-     and you'll see a link to a trace that looks like this:
-     `https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_0-2017-05-05_05-41-49-83206.html`
-
-Here are some screenshots showing what to click on:
-
-![Finding the Bisecting Revision Step](images/bisecting_revision_step.png)
-![Getting to the stdout](images/bisect_stdout.png)
\ No newline at end of file
+for details on how to interpret the results.
+
+### Traces and stdout
+
+On the Job result page, there is a line chart. Each dot represents a commit. The bisect culprits are represented by flashing dots. Clicking on a dot reveals some colored bars; each box represents one benchmark run. Click on one of the runs to see trace links. Click on the `task_id` link to see the stdout.
+
+![Trace links](images/pinpoint-trace-links.png)
--- a/docs/speed/images/bisect_dialog.png
+++ b/docs/speed/images/bisect_dialog.png
--- a/docs/speed/images/bisect_stdout.png
+++ b/docs/speed/images/bisect_stdout.png
--- a/docs/speed/images/bisecting_revision_step.png
+++ b/docs/speed/images/bisecting_revision_step.png
--- a/docs/speed/images/dashboard-alerts-page.png
+++ b/docs/speed/images/dashboard-alerts-page.png
--- a/docs/speed/images/pinpoint-job-page.png
+++ b/docs/speed/images/pinpoint-job-page.png
--- a/docs/speed/images/pinpoint-perf-try-dialog.png
+++ b/docs/speed/images/pinpoint-perf-try-dialog.png
--- a/docs/speed/images/pinpoint-trace-links.png
+++ b/docs/speed/images/pinpoint-trace-links.png
--- a/docs/speed/perf_trybots.md
+++ b/docs/speed/perf_trybots.md
-
 # Perf Try Bots

-[TOC]
-
-## What is a perf try job?
-
-Chrome has a performance lab with dozens of device and OS configurations. You
-can run performance tests on an unsubmitted CL on these devices using Pinpoint. The specified CL will be run against tip-of-tree with and without the CL applied.
-
-## Supported platforms
+Chrome has a performance lab with dozens of device and OS configurations.
+[Pinpoint](https://pinpoint-dot-chromeperf.appspot.com) is the service that lets
+you run performance tests in the lab. With Pinpoint, you can run try jobs, which
+let you put in a Gerrit patch, and it will run tip-of-tree with and without the
+patch applied.

-The platforms available in the lab change over time. To see the currently supported platofrms, click the "configuration" dropdown on the dialog.
-
-## Supported benchmarks
-
-All the telemetry benchmarks are supported by the perf trybots. To get a full
-list, run `tools/perf/run_benchmark list`.
+[TOC]

-To learn more about the benchmark, you can read about the
-[system health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782),
-which test Chrome's performance at a high level, and the
-[benchmark harnesses](https://docs.google.com/spreadsheets/d/1ZdQ9OHqEjF5v8dqNjd7lGUjJnK6sgi8MiqO7eZVMgD0/edit#gid=0),
-which cover more specific areas.
+## Why perf try jobs?

+* All of the devices exactly match the hardware and OS versions in the perf
+  continuous integration suite.
+* The devices have the "maintenance mutex" enabled, reducing noise from
+  background processes.
+* The devices are instrumented with BattOrs for power measurements.
+* Some regressions take multiple repeats to reproduce, and Pinpoint
+  automatically runs multiple times and aggregates the results.
+* Some regressions reproduce on some devices but not others, and Pinpoint will
+  run the job on multiple devices.

 ## Starting a perf try job

-![Pinpoint Perf Try Button](images/pinpoint-perf-try-button.png)
-
 Visit [Pinpoint](https://pinpoint-dot-chromeperf.appspot.com) and click the perf try button in the bottom right corner of the screen.

+![Pinpoint Perf Try Button](images/pinpoint-perf-try-button.png)
+
 You should see the following dialog popup:

 ![Perf Try Dialog](images/pinpoint-perf-try-dialog.png)
@@ -37,20 +33,26 @@ You should see the following dialog popup:

 **Build Arguments**| **Description**
 --- | ---
-Bug Id | (optional) A bug ID.
-Gerrit URL | The patch you want to run the benchmark on.
-Configuration | The configuration to run the test on.
-Browser | (optional) The specific browser to use for the test.
+Bug ID | (optional) A bug ID. Pinpoint will post updates on the bug.
+Gerrit URL | The patch you want to run the benchmark on. Patches in dependent repos (e.g. v8, skia) are supported.
+Bot | The device type to run the test on. All hardware configurations in our perf lab are supported.
+
+<br>

 **Test Arguments**| **Description**
 --- | ---
-Benchmark | A telemetry benchmark, eg. system_health.common_desktop
+Benchmark | A telemetry benchmark. E.g. `system_health.common_desktop`<br><br>All the telemetry benchmarks are supported by the perf trybots. To get a full list, run `tools/perf/run_benchmark list`<br><br>To learn more about the benchmarks, you can read about the [system health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782), which test Chrome's performance at a high level, and the [benchmark harnesses](https://docs.google.com/spreadsheets/d/1ZdQ9OHqEjF5v8dqNjd7lGUjJnK6sgi8MiqO7eZVMgD0/edit#gid=0), which cover more specific areas.
 Story | (optional) A specific story from the benchmark to run.
-Extra Test Arguments | (optional) Extra arguments for the test, eg. --extra-chrome-categories="foo,bar"
+Extra Test Arguments | (optional) Extra arguments for the test. E.g. `--extra-chrome-categories="foo,bar"`<br><br>To see all arguments, run `tools/perf/run_benchmark run --help`

-**Values Arguments**| **Description**
--- | ---
-Chart | (optional) Please ignore.
-TIR Label | (optional) Please ignore.
-Trace | (optional) Please ignore.
-Statistic | (optional) Please ignore.
+## Interpreting the results
+
+### Detailed results
+
+On the Job result page, click the "Analyze benchmark results" link at the top. See the [metrics results UI documentation](https://github.com/catapult-project/catapult/blob/master/docs/metrics-results-ui.md) for more details on reading the results.
+
+### Traces
+
+On the Job result page, there is a chart containing two dots. The left dot represents HEAD and the right dot represents the patch. Clicking on the right dot reveals some colored bars; each box represents one benchmark run. Click on one of the runs to see trace links.
+
+![Trace links](images/pinpoint-trace-links.png)