Commit c7bc05f5 authored by Dave Tu's avatar Dave Tu Committed by Commit Bot

Update speed docs for Pinpoint.

Bug: catapult:#4159, catapult:#4162
Change-Id: I074cada6d35fa3b0908e8a8d9451e228746db47f
Reviewed-on: https://chromium-review.googlesource.com/922701
Commit-Queue: David Tu <dtu@chromium.org>
Reviewed-by: default avatarAnnie Sullivan <sullivan@chromium.org>
Reviewed-by: default avatarSimon Hatch <simonhatch@chromium.org>
Cr-Commit-Position: refs/heads/master@{#537409}
parent 7dfb9b6d
......@@ -19,111 +19,117 @@ who you can cc on a performance bug if you have questions.
## Understanding the bisect results
The bisect bot spits out a comment on the bug that looks like this:
### The bug comment
The bisect service spits out a comment on the bug that looks like this:
> **📍 Found significant differences after each of 2 commits.**<br>
> https://pinpoint-dot-chromeperf.appspot.com/job/148a8d4e840000
>
> **Add smcgruer as an animations OWNER** by flackr@chromium.org<br>
> https://chromium.googlesource.com/chromium/src/+/b091c264862d26ac12d932e84eef7bd5f674e62b
>
> **Roll src/third_party/depot_tools/ 0f7b2007a..fd4ad2416 (1 commit)**
> by depot-tools-roller@chromium.org<br>
> https://chromium.googlesource.com/chromium/src/+/14fc99e3fd3614096caab7c7a8362edde8327a5d
>
> Understanding performance regressions:<br>
> &nbsp;&nbsp;http://g.co/ChromePerformanceRegressions
The bug comment gives a summary of that commits that caused improvements or
regressions. For more details, click the link at the beginning of the comment
to go to the Pinpoint Job details page.
### The Job details page
Clicking the Pinpoint link in the bug comment brings you to the Job details
page.
![Pinpoint Job page](images/pinpoint-job-page.png)
Down the left you can see some details about the bisect configuration, including
the benchmark (`loading.desktop`) and story (`Pantip`) that ran, the bot it ran
on (`chromium-rel-mac11-pro`), and the metric that was measured
(`cpuTimeToFirstMeaningfulPaint`). If you're not familiar with the benchmark or
metric, you can cc the
[benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
to ask for help.
The graph in the middle of the page shows a summary of the commits that were
tested across the x-axis and their results on the y-axis. The dots show the
medians, and the bars show the min and max. These can be used to estimate the
size of the regression. The units are not available on this page, but are on the
performance graphs linked on the bug in comment #1.
Click the `+` button in the bottom-right corner of the page to test a patch with
the current configuration.
### The alerts page
Comment 1 on the bug will have a link to the perf dashboard graphs for the
regression. (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
![Dashboard Alerts page](images/dashboard-alerts-page.png)
The graphs will give you an idea how widespread the regression is. The `Bot`
column shows all the different bots the regression occurred on, and the
`Test` column shows the metrics it regressed on. Often, the same metric
is gathered on many different web pages. If you see a long list of
pages, it's likely that the regression affects most pages; if it's
short, maybe your regression is an edge case. The size of the regressions on
each bot are also in the table in both relative and absolute terms.
## Debugging regressions
### How do I run the test?
It's best to [run a perf tryjob](perf_trybots.md), since the machines in the lab
are set up to match the device and software configs as the perf waterfall,
making the regression more likely to reproduce. From the Pinpoint Job page,
clicking the `+` button in the bottom-right corner to test a patch with the
current configuration.
You can also run locally:
```
=== BISECT JOB RESULTS ===
Perf regression found with culprit
Suspected Commit
Author : Your Name
Commit : 15092e9195954cbc331cd58e344d0895fe03d0cd
Date : Wed Jun 14 03:09:47 2017
Subject: Your CL Description.
Bisect Details
Configuration: mac_pro_perf_bisect
Benchmark : system_health.common_desktop
Metric : timeToFirstContentfulPaint_avg/load_search/load_search_taobao
Change : 15.25% | 1010.02 -> 1164.04
Revision Result N
chromium@479147 1010.02 +- 1535.41 14 good
chromium@479209 699.332 +- 1282.01 6 good
chromium@479240 383.617 +- 917.038 6 good
chromium@479255 649.186 +- 1896.26 14 good
chromium@479262 788.828 +- 1897.91 14 good
chromium@479268 880.727 +- 2235.29 21 good
chromium@479269 886.511 +- 1150.91 6 good
chromium@479270 1164.04 +- 979.746 14 bad <--
To Run This Test
src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.taobao system_health.common_desktop
src$ tools/perf/run_benchmark benchmark_name --story-filter story_name
```
There's a lot of information packed in that bug comment! Here's a breakdown:
* **What regressed exactly?** The comment gives you several details:
* **The benchmark that regressed**: Under `Bisect Details`, you can see
`Benchmark :`. In this case, the `system_health.common_desktop`
benchmark regressed.
* **What platform did it regress on?** Under `Configuration`, you can find
some details on the bot that regressed. In this example, it is a Mac Pro
laptop.
* **How do I run that locally?** Follow the instructions under
`To Run This Test`. But be aware that if it regressed on Android and
you're developing on Windows, you may not be able to reproduce locally.
(See [Debugging regressions](#Debugging-regressions) below)
* **What is this testing?** Generally the metric
(`timeToFirstContentfulPaint_avg`) gives some information. If you're not
familiar, you can cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
to ask for help.
* **How severe is this regression?** There are different axes on which to
answer that question:
* **How much did performance regress?** The bisect bot answers this both
in relative terms (`Change : 15.25%`) and absolute terms
(`1010.02 -> 1164.04`). To understand the absolute terms, you'll need
to look at the units on the performance graphs linked in comment #1
of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`).
In this example, the units are milliseconds; the time to load taobao
regressed from ~1.02 second to 1.16 seconds.
* **How widespread is the regression?** The graphs linked in comment #1
of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
will give you an idea how widespread the regression is. The `Bot`
column shows all the different bots the regression occurred on, and the
`Test` column shows the metrics it regressed on. Often, the same metric
is gathered on many different web pages. If you see a long list of
pages, it's likely that the regression affects most pages; if it's
short maybe your regression is an edge case.
### Can I get a trace?
## Debugging regressions
For most metrics, yes. Here are the steps:
* **How do I run the test locally???** Follow the instructions under
`To Run This Test` in the bisect comment. But be aware that regressions
are often hardware and/or platform-specific.
* **What do I do if I don't have the right hardware to test locally?** If
you don't have a local machine that matches the specs of the hardware that
regressed, you can run a perf tryjob on the same lab machines that ran the
bisect that blamed your CL.
[Here are the instructions for perf tryjobs](perf_trybots.md).
Drop the `perf_bisect` from the bot name and substitute dashes for
underscores to get the trybot name (`mac_pro_perf_bisect` -> `mac_pro`
in the example above).
* **Can I get a trace?** For most metrics, yes. Here are the steps:
1. Click on the `All graphs for this bug` link in comment #1. It should
1. Click on the `All graphs for this bug` link in comment #1. It should
look like this:
`https://chromeperf.appspot.com/group_report?bug_id=XXXX`
2. Select a bot/test combo that looks like what the bisect bot originally
2. Select a bot/test combo that looks like what the bisect bot originally
caught. You might want to look through various regressions for a really
large increase.
3. On the graph, click on the exclamation point icon at the regression, and
3. On the graph, click on the exclamation point icon at the regression, and
a tooltip comes up. There is a "trace" link in the tooltip, click it to
open a the trace that was recorded during the performance test.
* **Wait, what's a trace?** See the
[documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
to learn how to use traces to debug performance issues.
* **Are there debugging tips specific to certain benchmarks?**
* **[Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)**
* **[Android binary size](apk_size_regressions.md)**
4. There is also a "Request Debug Trace" button, which kicks off a tryjob with
all of the debug trace categories enabled.
### Wait, what's a trace?
See the
[documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
to learn how to use traces to debug performance issues.
### Are there debugging tips specific to certain benchmarks?
* [Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)
* [Android binary size](apk_size_regressions.md)
## If you don't believe your CL could be the cause
*** promo
Please remember that our performance tests exist to catch unexpected
> Please remember that our performance tests exist to catch unexpected
regressions. Often, the tests catch performance problems the CL author was
not aware of. Please look at the data carefully and understand what the test
is measuring before concluding that your CL is not related.
***
There are some clear reasons to believe the bisect bot made a mistake:
......
# Bisecting Performance Regressions
[TOC]
## What are performance bisects?
The perf tests on chromium's continuous build are very long-running, so we
cannot run them on every revision. Further, separate repositories like v8
and skia sometimes roll multiple performance-sensitive changes into chromium
at once. For these reasons, we need a tool that can bisect the root cause of
performance regressions over a CL range, descending into third_party
repositories as necessary. This is what the performance bisect bots do.
repositories as necessary. The name of the service that does this is called
[Pinpoint](https://pinpoint-dot-chromeperf.appspot.com/).
The team is currently working on a new version of performance biscect called
[pinpoint](https://docs.google.com/document/d/1FKPRNU2kbPJ15p6XHO0itCjYtfvCpGt2IHblriTX1tg/edit)
[TOC]
## Starting a perf bisect
Performance bisects are tightly integrated with the
Performance bisects are integrated with the
[Chrome Performance Dashboard](https://chromeperf.appspot.com/alerts) and
[monorail](https://bugs.chromium.org/p/chromium/issues/list). Users kick off
perf bisects on the perf dashboard and view results in monorail.
You can kick off a perf bisect anywhere you see a performance graph on the perf
dashboard (except for some tests which don't bisect, because they do not run on
the [chromium.perf waterfall](https://build.chromium.org/p/chromium.perf/waterfall)).
You can kick off perf bisect from performance graphs on the perf dashboard for
any test that runs on the
[chromium.perf waterfall](https://ci.chromium.org/p/chromium/g/chromium.perf/builders).
### To get to a graph, use one of the following methods:
......@@ -37,90 +33,43 @@ the [chromium.perf waterfall](https://build.chromium.org/p/chromium.perf/waterfa
### To kick off a bisect from the graph:
![Bisecting on a performance graph](images/bisect_graph.png)
![The bisect dialog](images/bisect_dialog.png)
1. Click on a data point in the graph.
2. In the tooltip that shows up, click the `BISECT` button.
3. Make sure to enter a Bug ID in the dialog that comes up.
4. Click the `START BISECT` button.
4. Click the `CREATE` button.
![Bisecting on a performance graph](images/bisect_graph.png)
![The bisect dialog](images/bisect_dialog.png =100)
### What are all the boxes in the form?
* **Bisect bot**: The name of the configuration in the perf lab to bisect on.
This has been prefilled to match the bot that generated the graph as
closely as possible.
* **Metric**: The metric of the performance test to bisect. This defaults to
the metric shown on the graph. It shows a list of other related metrics
(for example, if average page load time increased, the drop down will show
a list of individual pages which were measured).
* **Story filter**: This is a flag specific to
[telemetry](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md).
It tells telemetry to only run a specific test case, instead of running all
the test cases in the suite. This dramatically reduces bisect time for
large test suites. The dashboard will prefill this box based on the graph
you clicked on. If you suspect that test cases in the benchmark are not
independent, you can try bisecting with this box cleared.
* **Bug ID**: The bug number in monorail. It's very important to fill in
this field, as this is where bisect results will be posted.
* **Earlier revision**: The chromium commit pos to start bisecting from. This
* **Start commit**: The chromium commit pos to start bisecting from. This
is prefilled by the dashboard to the start of the revision range for the
point you clicked on. You can set it to an earlier commit position to
bisect a larger range.
* **Later revision**: The chromium commit pos to bisect to. This is prefilled
* **End commit**: The chromium commit pos to bisect to. This is prefilled
by the dashboard to the end of the revision range for the point you clicked
on. You can set it to a later commit pos to bisect a larger range.
* **Launch on staging bots**: This is an internal feature, which allows the
bisect team to launch a bisect on a test configuration. You likely don't
want to check this box unless instructed by the bisect team.
* **Bisect mode**: use "mean" to bisect the mean value of the performance
test. See below for "return_code".
## Bisecting test failures
The perf bisect bots can also be used to bisect performance test failures.
See details in [Triaging Data Stoppage Alerts](triaging_data_stoppage_alerts.md).
* **Story Filter**: This is a flag specific to
[telemetry](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md).
It tells telemetry to only run a specific test case, instead of running all
the test cases in the suite. This dramatically reduces bisect time for
large test suites. The dashboard will prefill this box based on the graph
you clicked on. If you suspect that test cases in the benchmark are not
independent, you can try bisecting with this box cleared.
* **Performance or functional**: use "performance" to bisect on a performance
metric, or "functional" to bisect on a test failure or flake.
## Interpreting the results
The bisect bot will output a comment on the bug you input into the dialog when
bisection is complete. See the
The bisect bot will output a comment on the bug when the bisection is complete. See
[Understanding the Bisect Results](addressing_performance_regressions.md#Understanding-the-bisect-results)
section of the Addressing Performance Regressions doc for details on how to
interpret the results.
## Getting more debugging data
The bisect outputs some additional data which might be useful for really tough
regressions or confusing results.
### Traces
Chrome traces are generated by most bisects and uploaded to cloud storage, but
they're not very visible in the UI. We plan to address this in
[pinpoint](https://docs.google.com/document/d/1FKPRNU2kbPJ15p6XHO0itCjYtfvCpGt2IHblriTX1tg/edit),
but in the short term here are the steps to get the traces from a bisect:
1. The bisect comment should have a "Debug Info" link that looks like this:
`https://chromeperf.appspot.com/buildbucket_job_status/8980436717323504240`
Click it.
2. In the debug info, you should see a "Buildbot link" that looks like this:
`https://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus7_perf_bisect/builds/4097`
Click it.
3. There will be several steps on the buildbot status page named "Bisecting
Revision". Each has an annotation like "Revision: chromium@474894" so you
can tell which revision it ran. Pick the commit position you want the
trace from (usually the one at your CL and the one immediately before).
Click the arrow by "> Nested step(s) for: Bisecting revision..." on those
steps.
4. In the nested steps, you'll see several steps titled "Bisecting
revision.Performance Test X of Y". These are the actual perf test runs.
Click the "stdout" link for one of these steps.
5. In the output, do a text search for "View generated trace files online"
and you'll see a link to a trace that looks like this:
`https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_0-2017-05-05_05-41-49-83206.html`
Here are some screenshots showing what to click on:
![Finding the Bisecting Revision Step](images/bisecting_revision_step.png)
![Getting to the stdout](images/bisect_stdout.png)
\ No newline at end of file
for details on how to interpret the results.
### Traces and stdout
On the Job result page, there is a line chart. Each dot represents a commit. The bisect culprits are represented by flashing dots. Clicking on a dot reveals some colored bars; each box represents one benchmark run. Click on one of the runs to see trace links. Click on the `task_id` link to see the stdout.
![Trace links](images/pinpoint-trace-links.png)
docs/speed/images/bisect_dialog.png

48.1 KB | W: | H:

docs/speed/images/bisect_dialog.png

48.3 KB | W: | H:

docs/speed/images/bisect_dialog.png
docs/speed/images/bisect_dialog.png
docs/speed/images/bisect_dialog.png
docs/speed/images/bisect_dialog.png
  • 2-up
  • Swipe
  • Onion skin
# Perf Try Bots
[TOC]
## What is a perf try job?
Chrome has a performance lab with dozens of device and OS configurations. You
can run performance tests on an unsubmitted CL on these devices using Pinpoint. The specified CL will be run against tip-of-tree with and without the CL applied.
## Supported platforms
Chrome has a performance lab with dozens of device and OS configurations.
[Pinpoint](https://pinpoint-dot-chromeperf.appspot.com) is the service that lets
you run performance tests in the lab. With Pinpoint, you can run try jobs, which
let you put in a Gerrit patch, and it will run tip-of-tree with and without the
patch applied.
The platforms available in the lab change over time. To see the currently supported platofrms, click the "configuration" dropdown on the dialog.
## Supported benchmarks
All the telemetry benchmarks are supported by the perf trybots. To get a full
list, run `tools/perf/run_benchmark list`.
[TOC]
To learn more about the benchmark, you can read about the
[system health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782),
which test Chrome's performance at a high level, and the
[benchmark harnesses](https://docs.google.com/spreadsheets/d/1ZdQ9OHqEjF5v8dqNjd7lGUjJnK6sgi8MiqO7eZVMgD0/edit#gid=0),
which cover more specific areas.
## Why perf try jobs?
* All of the devices exactly match the hardware and OS versions in the perf
continuous integration suite.
* The devices have the "maintenance mutex" enabled, reducing noise from
background processes.
* The devices are instrumented with BattOrs for power measurements.
* Some regressions take multiple repeats to reproduce, and Pinpoint
automatically runs multiple times and aggregates the results.
* Some regressions reproduce on some devices but not others, and Pinpoint will
run the job on multiple devices.
## Starting a perf try job
![Pinpoint Perf Try Button](images/pinpoint-perf-try-button.png)
Visit [Pinpoint](https://pinpoint-dot-chromeperf.appspot.com) and click the perf try button in the bottom right corner of the screen.
![Pinpoint Perf Try Button](images/pinpoint-perf-try-button.png)
You should see the following dialog popup:
![Perf Try Dialog](images/pinpoint-perf-try-dialog.png)
......@@ -37,20 +33,26 @@ You should see the following dialog popup:
**Build Arguments**| **Description**
--- | ---
Bug Id | (optional) A bug ID.
Gerrit URL | The patch you want to run the benchmark on.
Configuration | The configuration to run the test on.
Browser | (optional) The specific browser to use for the test.
Bug ID | (optional) A bug ID. Pinpoint will post updates on the bug.
Gerrit URL | The patch you want to run the benchmark on. Patches in dependent repos (e.g. v8, skia) are supported.
Bot | The device type to run the test on. All hardware configurations in our perf lab are supported.
<br>
**Test Arguments**| **Description**
--- | ---
Benchmark | A telemetry benchmark, eg. system_health.common_desktop
Benchmark | A telemetry benchmark. E.g. `system_health.common_desktop`<br><br>All the telemetry benchmarks are supported by the perf trybots. To get a full list, run `tools/perf/run_benchmark list`<br><br>To learn more about the benchmarks, you can read about the [system health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782), which test Chrome's performance at a high level, and the [benchmark harnesses](https://docs.google.com/spreadsheets/d/1ZdQ9OHqEjF5v8dqNjd7lGUjJnK6sgi8MiqO7eZVMgD0/edit#gid=0), which cover more specific areas.
Story | (optional) A specific story from the benchmark to run.
Extra Test Arguments | (optional) Extra arguments for the test, eg. --extra-chrome-categories="foo,bar"
Extra Test Arguments | (optional) Extra arguments for the test. E.g. `--extra-chrome-categories="foo,bar"`<br><br>To see all arguments, run `tools/perf/run_benchmark run --help`
**Values Arguments**| **Description**
--- | ---
Chart | (optional) Please ignore.
TIR Label | (optional) Please ignore.
Trace | (optional) Please ignore.
Statistic | (optional) Please ignore.
## Interpreting the results
### Detailed results
On the Job result page, click the "Analyze benchmark results" link at the top. See the [metrics results UI documentation](https://github.com/catapult-project/catapult/blob/master/docs/metrics-results-ui.md) for more details on reading the results.
### Traces
On the Job result page, there is a chart containing two dots. The left dot represents HEAD and the right dot represents the patch. Clicking on the right dot reveals some colored bars; each box represents one benchmark run. Click on one of the runs to see trace links.
![Trace links](images/pinpoint-trace-links.png)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment