Commit 10572410 authored by Dean Michael Berris's avatar Dean Michael Berris Committed by Commit Bot

chromeperf: Update sheriff duties w/ automation

This change updates the responsibilities for the performance regression
sheriffs given the new automated process for triage and bisection. This
removes the need to proactively poll the alerts page in the Chromperf
Dashboard, and instead moves the sheriffing rotation to a reactive model
handling Monorail issues.

R=tdresser@chromium.org

Change-Id: Icb6e58052bd1c7480b57c341438a23b560120000
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2378408
Auto-Submit: Dean Berris <dberris@chromium.org>
Commit-Queue: Timothy Dresser <tdresser@chromium.org>
Reviewed-by: default avatarTimothy Dresser <tdresser@chromium.org>
Cr-Commit-Position: refs/heads/master@{#802265}
parent 462e3219
# Perf Regression Sheriffing (go/perfregression-sheriff)
The perf regression sheriff tracks performance regressions in Chrome's
continuous integration tests. Note that a [new rotation](perf_bot_sheriffing.md)
has been created to ensure the builds and tests stay green, so the perf
regression sheriff role is now entirely focused on performance.
continuous integration tests. Note that a [different
rotation](perf_bot_sheriffing.md) has been created to ensure the builds and
tests stay green, so the perf regression sheriff role is now entirely focused
on performance.
**[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)**
## Key Responsibilities
* [Triage Regressions on the Perf Dashboard](#Triage-Regressions-on-the-Perf-Dashboard)
* [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)
* [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)
## Triage Regressions on the Perf Dashboard
Open the perf dashboard [alerts page](https://chromeperf.appspot.com/alerts).
In the upper right corner, **sign in with your Chromium account**. Signing in is
important in order to be able to kick off bisect jobs, and see data from
internal waterfalls.
Pick up **Chromium Perf Sheriff** from "Select an item ▼" drop down menu.
table of "Performance Alerts" should be shown. If there are no currently pending
alerts, then the table won't be shown.
The list can be sorted by clicking on the column header. When you click on the
checkbox next to an alert, all the other alerts that occurred in the same
revision range will be highlighted.
Check the boxes next to the alerts you want to take a look at, and click the
"Graph" button. You'll be taken to a page with a table at the top listing all
the alerts that have an overlapping revision range with the one you chose, and
below it the dashboard shows graphs of all the alerts checked in that table.
1. **For alerts related to `resource_sizes`:**
* Refer to [apk_size_regressions.md](apk_size_regressions.md).
2. **Look at the graph**.
* If the alert appears to be **within the noise**, click on the red
exclamation point icon for it in the graph and hit the "Report Invalid
Alert" button.
* If the alert appears to be **reverting a recent improvement**, click on
the red exclamation point icon for it in the graph and hit the "Ignore
Valid Alert" button.
* If the alert is **visibly to the left or the right of the
actual regression**, click on it and use the "nudge" menu to move it into
place.
* If there is a line labeled "ref" on the graph, that is the reference build.
It's an older version of Chrome, used to help us sort out whether a change
to the bot or test might have caused the graph to jump, rather than a real
performance regression. If **the ref build moved at the same time as the
alert**, click on the alert and hit the "Report Invalid Alert" button.
3. **Look at the other alerts** in the table to see if any should be grouped together.
Note that the bisect will automatically dupe bugs if it finds they have the
same culprit, so you don't need to be too aggressive about grouping alerts
that might not be related. Some signs alerts should be grouped together:
* If they're all in the same test suite
* If they all regressed the same metric (a lot of commonality in the Test
column)
4. **Triage the group of alerts**. Check all the alerts you believe are related,
and press the triage button.
* If one of the alerts already has a bug id, click "existing bug" and use
that bug id.
* Otherwise click "new bug".
* Only add a description if you have additional context. Otherwise a default
description will be automatically added when left blank.
5. **Look at the revision range** for the regression. You can see it in the
tooltip on the graph. If you see any likely culprits, cc the authors on the
bug.
6. **Optionally, kick off more bisects**. The perf dashboard will automatically
kick off a bisect for each bug you file. But if you think the regression is
much clearer on one platform, or a specific page of a page set, or you want
to see a broader revision range feel free to click on the alert on that graph
and kick off a bisect for it. There should be capacity to kick off as many
bisects as you feel are necessary to investigate; [give feedback](#feedback)
below if you feel that is not the case.
### Dashboard UI Tips
* Grouping is done client side today. If you click "Show more" at the bottom
until you can see all the alerts, the alerts will be grouped together more.
* You can shift click on the check boxes to select multiple alerts quickly.
* [Address bugs needing attention](#Address-bugs-needing-attention)
## Follow up on Performance Regressions
* [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)
* [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)
## Address bugs needing attention
NOTE: Ensure that you're signed into Monorail.
Use [this Monorail query](https://bugs.chromium.org/p/chromium/issues/list?sort=modified&q=label%3AChromeperf-Sheriff-NeedsAttention%2CChromeperf-Auto-NeedsAttention%20-has%3Aowner&can=2)
to find automatically triaged issues which need attention.
NOTE: If the list of issues that need attention is empty, please jump ahead to
[Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions).
Issues in the list will include automatically filed and bisected regressions
that are supported by the Chromium Perf Sheriff rotation. For each of the
issues:
1. Determine the cause of the failure:
* If it's Pinpoint failing to find a culprit, consider re-running the
failing Pinpoint job.
During your shift, you should try to follow up on each of the bugs you filed.
Once you've triaged all the alerts, check to see if the bisects have come back,
or if they failed. If the results came back, and a culprit was found, follow up
with the CL author. If the bisects failed to update the bug with results, please
file a bug on it (see [feedback](#feedback) links below).
* If it's the Chromeperf Dashboard failing to start a Pinpoint bisection,
consider running a bisection from the grouped alerts. The issue
description should have a link to the group of anomalies associated with
the issue.
* If this was a manual escalation (e.g. a suspected culprit author put the
`Chromeperf-Sheriff-NeedsAttention` label to seek help) use the tools at
your disposal, like:
* Retry the most recent Pinpoint job, potentially changing the parameters.
* Inspect the results of the Pinpoint job associated with the issues and
decide that this could be noise.
* In cases where it's unclear what next should be done, escalate the issue
to the Chrome Speed Tooling team by adding the `Speed>Bisection` component
and leaving the issue `Untriaged` or `Unconfirmed`.
2. Remove the `Chromeperf-Sheriff-NeedsAttention` or
`Chromeperf-Auto-NeedsAttention` label once you've acted on an issue.
**For alerts related to `resource_sizes`:** Refer to
[apk_size_regressions.md](apk_size_regressions.md).
## Follow up on Performance Regressions
Also during your shift, please spend any spare time driving down bugs from the
[regression backlog](http://go/triage-backlog). Treat these bugs as you would
your own -- investigate the regressions, find out what the next step should be,
and then move the bug along. Some possible next steps and questions to answer
are:
Please spend any spare time driving down bugs from the [regression
backlog](http://go/triage-backlog). Treat these bugs as you would your own --
investigate the regressions, find out what the next step should be, and then
move the bug along. Some possible next steps and questions to answer are:
* Should the bug be closed?
* Are there questions that need to be answered?
* Are there people that should be added to the CC list?
* Is the correct owner assigned?
* Should the bug be closed?
* Are there questions that need to be answered?
* Are there people that should be added to the CC list?
* Is the correct owner assigned?
When a bug does need to be pinged, rather than adding a generic "ping", it's
much much more effective to include the username and action item.
......@@ -121,16 +91,9 @@ tools are accurate and improving them. Please file bugs and feature requests
as you see them:
* **Perf Dashboard**: Please use the red "Report Issue" link in the navbar.
* **Perf Bisect/Trybots**: If a bisect is identifying the wrong CL as culprit
* **Pinpoint**: If Pinpoint is identifying the wrong CL as culprit
or missing a clear culprit, or not reproducing what appears to be a clear
regression, please link the comment the bisect bot posted on the bug at
[go/bad-bisects](https://docs.google.com/spreadsheets/d/13PYIlRGE8eZzsrSocA3SR2LEHdzc8n9ORUoOE2vtO6I/edit#gid=0).
The team triages these regularly. If you spot a really clear bug (bisect
job red, bugs not being updated with bisect results) please file it in
crbug with component `Speed>Bisection`. If a bisect problem is blocking a
perf regression bug triage, **please file a new bug with component
`Speed>Bisection` and block the regression bug on the bisect bug**. This
makes it much easier for the team to triage, dupe, and close bugs on the
infrastructure without affecting the state of the perf regression bugs.
regression, please file an issue in crbug with the `Speed>Bisection`
component.
* **Noisy Tests**: Please file a bug in crbug with component `Speed>Benchmarks`
and [cc the owner](http://go/perf-owners).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment