Commit ddd5dc25 authored by asanka's avatar asanka Committed by Commit bot

[net] Convert bug triage documents to Markdown.

R=rdsmith
BUG=none

Review URL: https://codereview.chromium.org/1017743002

Cr-Commit-Position: refs/heads/master@{#321574}
parent b98970ff
# Chrome Network Bug Triage : Labels
## Some network label caveats
**Cr-UI-Browser-Downloads**
: Despite the name, this covers all issues related to downloading a file except
saving entire pages (which is **Cr-Blink-SavePage**), not just UI issues.
Most downloads bugs will have the word "download" or "save as" in the
description. Issues with the HTTP server for the Chrome binaries are not
downloads bugs.
**Cr-UI-Browser-SafeBrowsing**
: Bugs that have to do with the process by which a URL or file is determined to
be dangerous based on our databases, or the resulting interstitials.
Determination of danger based purely on content-type or file extension
belongs in **Cr-UI-Browser-Downloads**, not SafeBrowsing.
**Cr-Internals-Network-SSL**
: This includes issues that should be also tagged as **Cr-Security-UX**
(certificate error pages or other security interstitials, omnibox indicators
that a page is secure), and more general SSL issues. If you see requests
that die in the SSL negotiation phase, in particular, this is often the
correct label.
**Cr-Internals-Network-DataProxy**
: Flywheel / the Data Reduction Proxy. Issues require "Reduce Data Usage" be
turned on. Proxy url is [https://proxy.googlezip.net:443](), with
[http://compress.googlezip.net:80]() as a fallback. Currently Android and
iOS only.
**Cr-Internals-Network-Cache**
: The cache is the layer that handles most range request logic (Though range
requests may also be issued by the PDF plugin, XHRs, or other components).
**Cr-Internals-Network-SPDY**
: Covers HTTP2 as well.
**Cr-Internals-Network-HTTP**
: Typically not used. Unclear what it covers, and there's no specific HTTP
owner.
**Cr-Internals-Network-Logging**
: Covers **about:net-internals**, **about:net-export** as well as the what's
sent to the NetLog.
**Cr-Internals-Network-Connectivity**
: Issues related to switching between networks, ERR_NETWORK_CHANGED, Chrome
thinking it's online when it's not / navigator.onLine inaccuracies, etc.
**Cr-Internals-Network-Filters**
: Covers SDCH and gzip issues. ERR_CONTENT_DECODING_FAILED indicates a problem
at this layer, and bugs here can also cause response body corruption.
## Common non-network labels
Bugs in these areas often receive the **Cr-Internals-Network** label, though
they fall largely outside the purview of the network stack team:
**Cr-Blink-Forms**
: Issues submitting forms, forms having weird data, forms sending the wrong
method, etc.
**Cr-Blink-Loader**
: Cross origin issues are sometimes loader related. Blink also has an
in-memory cache, and when it's used, requests don't appear in
about:net-internals. Requests for the same URL are also often merged there
as well. This does *not* cover issues with content/browser/loader/ files.
**Cr-Blink-ServiceWorker**
**Cr-Blink-Storage-AppCache**
**Cr-Blink-WebSockets**
**Cr-Blink-XHR**
: Generic issues with sync/async XHR requests - missing request or response
headers, multiple headers, etc. These will often run into issues in certain
corner cases (Cross origin / CORS, proxy, whatever). Attach all labels that
seem appropriate.
**Cr-Services-Sync**
: Sharing data/tabs/history/passwords/etc between machines not working.
**Cr-Services-Chromoting**
**Cr-Platform-Extensions**
: Issues extensions loading / not loading / hanging.
**Cr-Platform-Extensions-API**
: Issues with network related extension APIs should have this label.
chrome.webRequest is the big one, I believe, but there are others.
**Cr-Internals-Plugins-Pepper[-SDK]**
**Cr-UI-Browser-Omnibox**
: Basically any issue with the omnibox. URLs being treated as search queries
rather than navigations, dropdown results being weird, not handling certain
unicode characters, etc. If the issue is new TLDs not being recognized by
the omnibox, that's due to Chrome's TLD list being out of date, and not an
omnibox issue. Such TLD issues should be duped against
http://crbug.com/37436.
**Cr-Internals-Media-Network**
: Issues related to media. These often run into the 6 requests per hostname
issue, and also have fun interactions with the cache, particularly in the
range request case.
**Cr-Internals-Plugins-PDF**
: Issues loading pdf files. These are often related to range requests, which
also have some logic at the Internals-Network-Cache layer.
**Cr-UI-Browser-Navigation**
**Cr-UI-Browser-History**
: Issues which only appear with forward/back navigation.
**Cr-OS-Systems-Network** / **Cr-OS-Systems-Mobile** / **Cr-OS-Systems-Bluetooth**
: These should be used for issues with ChromeOS's platform network code, and
not net/ issues on ChromeOS.
**Cr-Blink-SecurityFeature**
: CORS / Cross origin issues. Main frame cross-origin navigation issues are
often actually **Cr-UI-Browser-Navigation** issues.
**Cr-Privacy**
: Privacy related bug (History, cookies discoverable by an entity that
shouldn't be able to do so, incognito state being saved in memory or on disk
beyond the lifetime of incognito tabs, etc). Generally used in conjunction
with other labels.
**Type-Bug-Security**
: Security related bug (Allows for code execution from remote site, allows
crossing security boundaries, unchecked array bounds,
etc).
Some network label caveats
* Cr-UI-Browser-Downloads: Despite the name, this covers all issues related to
downloading a file except saving entire pages (Which is Cr-Blink-SavePage),
not just UI issues. Most downloads bugs will have the word "download" or
"save as" in the description. Issues with the HTTP server for the Chrome
binaries are not downloads bugs.
* Cr-UI-Browser-SafeBrowsing: Bugs that have to do with the process by which a
URL or file is determined to be dangerous based on our databases, or the
resulting interstitials. Determination of danger based purely on
content-type or file extension belongs in Cr-UI-Browser-Downloads, not
SafeBrowsing.
* Cr-Internals-Network-SSL: This includes issues that should be also tagged as
Cr-Security-UX (certificate error pages or other security interstitials,
omnibox indicators that a page is secure), and more general SSL issues. If
you see requests that die in the SSL negotiation phase, in particular, this
is often the correct label.
* Cr-Internals-Network-DataProxy: Flywheel / the Data Reduction Proxy. Issues
require "Reduce Data Usage" be turned on. Proxy url is
https://proxy.googlezip.net:443, with compress.googlezip.net:80 as a
fallback. Currently Android and iOS only.
* Cr-Internals-Network-Cache: The cache is the layer that handles most range
request logic (Though range requests may also be issued by the PDF plugin,
XHRs, or other components).
* Cr-Internals-Network-SPDY: Covers HTTP2 as well.
* Cr-Internals-Network-HTTP: Typically not used. Unclear what it covers, and
there's no specific HTTP owner.
* Cr-Internals-Network-Logging: Covers about:net-internals, about:net-export as
well as the what's sent to the NetLog.
* Cr-Internals-Network-Connectivity: Issues related to switching between
networks, ERR_NETWORK_CHANGED, Chrome thinking it's online when it's not /
navigator.onLine inaccuracies, etc.
* Cr-Internals-Network-Filters: Covers SDCH and gzip issues.
ERR_CONTENT_DECODING_FAILED indicates a problem at this layer, and bugs here
can also cause response body corruption.
Common non-network label reference. Bugs in these areas often receive the
Cr-Internals-Network label, though they fall largely outside the purview of the
network stack team:
* Cr-Blink-Forms: Issues submitting forms, forms having weird data, forms
sending the wrong method, etc.
* Cr-Blink-Loader: Cross origin issues are sometimes loader related. Blink
also has an in-memory cache, and when it's used, requests don't appear in
about:net-internals. Requests for the same URL are also often merged there
as well. This does *not* cover issues with content/browser/loader/ files.
* Cr-Blink-ServiceWorker
* Cr-Blink-Storage-AppCache
* Cr-Blink-WebSockets
* Cr-Blink-XHR: Generic issues with sync/async XHR requests - missing request
or response headers, multiple headers, etc. These will often run into
issues in certain corner cases (Cross origin / CORS, proxy, whatever).
Attach all labels that seem appropriate.
* Cr-Services-Sync: Sharing data/tabs/history/passwords/etc between machines
not working.
* Cr-Services-Chromoting
* Cr-Platform-Extensions: Issues extensions loading / not loading / hanging.
* Cr-Platform-Extensions-API: Issues with network related extension APIs should
have this label. chrome.webRequest is the big one, I believe, but there are
others.
* Cr-Internals-Plugins-Pepper[-SDK]
* Cr-UI-Browser-Omnibox: Basically any issue with the omnibox. URLs being
treated as search queries rather than navigations, dropdown results being
weird, not handling certain unicode characters, etc. If the issue is new
TLDs not being recognized by the omnibox, that's due to Chrome's TLD list
being out of date, and not an omnibox issue. Such TLD issues should be
duped against http://crbug.com/37436.
* Cr-Internals-Media-Network: Issues related to media. These often run into
the 6 requests per hostname issue, and also have fun interactions with the
cache, particularly in the range request case.
* Cr-Internals-Plugins-PDF: Issues loading pdf files. These are often related
to range requests, which also have some logic at the Internals-Network-Cache
layer.
* Cr-UI-Browser-Navigation
* Cr-UI-Browser-History: Issues which only appear with forward/back navigation.
* Cr-OS-Systems-Network / Cr-OS-Systems-Mobile / Cr-OS-Systems-Bluetooth: These
should be used for issues with ChromeOS's platform network code, and not
net/ issues on ChromeOS.
* Cr-Blink-SecurityFeature: CORS / Cross origin issues. Main frame
cross-origin navigation issues are often actually Cr-UI-Browser-Navigation
issues.
* Cr-Privacy: Privacy related bug (History, cookies discoverable by an entity
that shouldn't be able to do so, incognito state being saved in memory or on
disk beyond the lifetime of incognito tabs, etc). Generally used in
conjunction with other labels.
* Type-Bug-Security: Security related bug (Allows for code execution from
remote site, allows crossing security boundaries, unchecked array bounds,
etc).
# Chrome Network Bug Triage : Suggested Workflow
[TOC]
## Looking for new crashers
1. Go to [go/chromecrash](https://goto.google.com/chromecrash).
2. For each platform, look through the releases for which releases to
investigate. As per bug-triage.txt, this should be the most recent canary,
the previous canary (if the most recent is less than a day old), and any of
dev/beta/stable that were released in the last couple of days.
3. For each release, in the "Process Type" frame, click on "browser".
4. At the bottom of the "Magic Signature" frame, click "limit 1000". Reported
crashers are sorted in decreasing order of the number of reports for that
crash signature.
5. Search the page for *"net::"*.
6. For each found signature:
* If there is a bug already filed, make sure it is correctly describing the
current bug (e.g. not closed, or not describing a long-past issue), and
make sure that if it is a *net* bug, that it is labeled as such.
* Ignore signatures that only occur once, as memory corruption can easily
cause one-off failures when the sample size is large enough.
* Ignore signatures that only come from a single client ID, as individual
machine malware and breakage can also easily cause one-off failures.
* Click on the number of reports field to see details of crash. Ignore it
if it doesn't appear to be a network bug.
* Otherwise, file a new bug directly from chromecrash. Note that this may
result in filing bugs for low- and very-low- frequency crashes. That's
ok; the bug tracker is a better tool to figure out whether or not we put
resources into those crashes than a snap judgement when filing bugs.
* For each bug you file, include the following information:
* The backtrace. Note that the backtrace should not be added to the
bug if Restrict-View-Google isn't set on the bug as it may contain
PII. Filing the bug from the crash reporter should do this
automatically, but check.
* The channel in which the bug is seen (canary/dev/beta/stable), its
frequency in that channel, and its rank among crashers in the
channel.
* The frequency of this signature in recent releases. This information
is available by:
1. Clicking on the signature in the "Magic Signature" list
2. Clicking "Edit" on the dremel query at the top of the page
3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
"Update".
4. Clicking "Limit 1000" in the Product Version list in the
resulting page (without this, the listing will be restricted to
the releases in which the signature is most common, which will
often not include the canary/dev release being investigated).
5. Choose some subset of that list, or all of it, to include in the
bug. Make sure to indicate if there is a defined point in the
past before which the signature is not present.
## Identifying unlabeled network bugs on the tracker
* Look at new uncomfirmed bugs since noon PST on the last triager's rotation.
[Use this issue tracker
query](https://code.google.com/p/chromium/issues/list?can=2&q=status%3Aunconfirmed&sort=-id&num=1000).
* Press **h** to bring up a preview of the bug text.
* Use **j** and **k** to advance through bugs.
* If a bug looks like it might be network/download/safe-browsing related,
middle click (or command-click on OSX) to open in new tab.
* If a user provides a crash ID for a crasher for a bug that could be
net-related, look at the crash stack at
[go/crash](https://goto.google.com/crash), and see if it looks to be network
related. Be sure to check if other bug reports have that stack trace, and
mark as a dupe if so. Even if the bug isn't network related, paste the stack
trace in the bug, so no one else has to look up the crash stack from the ID.
* If there's no other information than the crash ID, ask for more details
and add the Needs-Feedback label.
* If network causes are possible, ask for a net-internals log (If it's not a
browser crash) and attach the most specific internals-network label that's
applicable. If there isn't an applicable narrower label, a clear owner for
the issue, or there are multiple possibilities, attach the internals-network
label and proceed with further investigation.
* If non-network causes also seem possible, attach those labels as well.
## Investigating Cr-Internals-Network bugs
* It's recommended that while on triage duty, you subscribe to the
Cr-Internals-Network label. To do this, go to
https://code.google.com/p/chromium/issues/ and click on "Subscriptions".
Enter "Cr-Internals-Network" and click submit.
* Look through uncomfirmed and untriaged Cr-Internals-Network bugs,
prioritizing those updated within the last week. [Use this issue tracker
query](https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3DInternals-Network+-status%3AAssigned+-status%3AStarted+-status%3AAvailable+&sort=-modified).
* If more information is needed from the reporter, ask for it and add the
Needs-Feedback label. If the reporter has answered an earlier request for
information, remove that label.
* While investigating a new issue, change the status to Untriaged.
* If a bug is a potential security issue (Allows for code execution from remote
site, allows crossing security boundaries, unchecked array bounds, etc) mark
it Type-Bug-Security. If it has privacy implication (History, cookies
discoverable by an entity that shouldn't be able to do so, incognito state
being saved in memory or on disk beyond the lifetime of incognito tabs, etc),
mark it Cr-Privacy.
* For bugs that already have a more specific network label, go ahead and remove
the Cr-Internals-Network label and move on.
* Try to figure out if it's really a network bug. See common non-network
labels section for description of common labels needed for issues incorrectly
tagged as Cr-Internals-Network.
* If it's not, attach appropriate labels and go no further.
* If it may be a network bug, attach additional possibly relevant labels if
any, and continue investigating. Once you either determine it's a
non-network bug, or figure out accurate more specific network labels, your
job is done, though you should still ask for a net-internals dump if it seems
likely to be useful.
* Note that ChromeOS-specific network-related code (Captive portal detection,
connectivity detection, login, etc) may not all have appropriate more
specific labels, but are not in areas handled by the network stack team.
Just make sure those have the OS-Chrome label, and any more specific labels
if applicable, and then move on.
* Gather data and investigate.
* Remember to add the Needs-Feedback label whenever waiting for the user to
respond with more information, and remove it when not waiting on the
user.
* Try to reproduce locally. If you can, and it's a regression, use
src/tools/bisect-builds.py to figure out when it regressed.
* Ask more data from the user as needed (net-internals dumps, repro case,
crash ID from about:crashes, run tests, etc).
* If asking for an about:net-internals dump, provide this link:
https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details.
Can just grab the link from about:net-internals, as needed.
* Try to figure out what's going on, and which more specific network label is
most appropriate.
* If it's a regression, browse through the git history of relevant files to try
and figure out when it regressed. CC authors / primary reviewers of any
strongly suspect CLs.
* If you are having trouble with an issue, particularly for help understanding
net-internals logs, email the public net-dev@chromium.org list for help
debugging. If it's a crasher, or for some other reason discussion needs to
be done in private, use chrome-network-debugging@google.com. TODO(mmenke):
Write up a net-internals tips and tricks docs.
* If it appears to be a bug in the unowned core of the network stack (i.e. no
sublabel applies, or only the Cr-Internals-Network-HTTP sublabel applies, and
there's no clear owner), try to figure out the exact cause.
## Monitoring UMA histograms and gasper alerts
For each Gasper alert that fires, determine if it's a real alert and file a bug
if so.
* Don't file if the alert is coincident with a major volume change. The volume
at a particular date can be determined by hovering the mouse over the
appropriate location on the alert line.
* Don't file if the alert is on a graph with very low volume (< ~200 data
points); it's probably noise, and we probably don't care even if it isn't.
* Don't file if the graph is really noisy (but eyeball it to decide if there is
an underlying important shift under the noise).
* Don't file if the alert is in the "Known Ignorable" list:
* SimpleCache on Windows
* DiskCache on Android.
For each Gasper alert, respond to chrome-network-debugging@google.com with a
summary of the action you've taken and why, including issue link if an issue
was filed.
## Investigating crashers
* Only investigate crashers that are still occurring, as identified by above
section. If a search on go/crash indicates a crasher is no longer occurring,
mark it as WontFix.
* Particularly for Windows, look for weird dlls associated with the crashes.
If there are some, it may be caused by malware. You can often figure out if
a dll is malware by a search, though it's harder to figure out if a dll is
definitively not malware.
* See if the same users are repeatedly running into the same issue. This can
be accomplished by search for (Or clicking on) the client ID associated with
a crash report, and seeing if there are multiple reports for the same crash.
If this is the case, it may be also be malware, or an issue with an unusual
system/chrome/network config.
* Dig through crash reports to figure out when the crash first appeared, and
dig through revision history in related files to try and locate a suspect CL.
TODO(mmenke): Add more detail here.
* Load crash dumps, try to figure out a cause. See
http://www.chromium.org/developers/crash-reports for more information
## Dealing with old bugs
* For all network issues (Even those with owners, or a more specific labels):
* If the issue has had the Needs-Feedback label for over a month, verify it
is waiting on feedback from the user. If not, remove the label.
Otherwise, go ahead and mark the issue WontFix due to lack of response
and suggest the user file a new bug if the issue is still present. [Use
this issue tracker query for old Needs-Feedback
issues](https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3AInternals-Network%20Needs=Feedback+modified-before%3Atoday-30&sort=-modified).
* If a bug is over 2 months old, and the underlying problem was never
reproduced or really understood:
* If it's over a year old, go ahead and mark the issue as Archived.
* Otherwise, ask reporters if the issue is still present, and attach
the Needs-Feedback label.
* Old unconfirmed or untriaged Cr-Internals-Network issues can be investigated
just like newer ones. Crashers should generally be given higher priority,
since we can verify if they still occur, and then newer issues, as they're
more likely to still be present, and more likely to have a still responsive
bug reporter.
Look for new crashers:
* Go to go/chromecrash.
* For each platform, look through the releases for which releases to
investigate. As per bug-triage.txt, this should be the
most recent canary, the previous canary (if the most recent is less
than a day old), and any of dev/beta/stable that were released in the
last couple of days.
* For each release, in the "Process Type" frame, click on "browser".
* At the bottom of the "Magic Signature" frame, click "limit 1000".
Reported crashers are sorted in decreasing order of the number of reports for
that crash signature.
* Search the page for "net::".
* For each found signature:
* If there is a bug already filed, make sure it is correctly
describing the current bug (e.g. not closed, or not describing a
long-past issue), and make sure that if it is a net:: bug, that
it is labeled as such.
* Ignore signatures that only occur once, as memory corruption can
easily cause one-off failures when the sample size is large
enough.
* Ignore signatures that only come from a single client ID, as
individual machine malware and breakage can also easily cause
one-off failures.
* Click on the number of reports field to see details of
crash. Ignore it if it doesn't appear to be a network bug.
* Otherwise, file a new bug directly from chromecrash. Note that
this may result in filing bugs for low- and very-low- frequency
crashes. That's ok; the bug tracker is a better tool to figure
out whether or not we put resources into those crashes than a snap
judgement when filing bugs.
* For each bug you file, include the following information:
* The backtrace. Note that the backtrace should not be added to the
bug if Restrict-View-Google isn't set on the bug as it may contain
PII. Filing the bug from the crash reporter should do this
automatically, but check.
* The channel in which the bug is seen (canary/dev/beta/stable),
its frequency in that channel, and its rank among crashers in the channel.
* The frequency of this signature in recent releases. This
information is available by:
* Clicking on the signature in the "Magic Signature" list
* Clicking "Edit" on the dremel query at the top of the page
* Removing the "product.version='X.Y.Z.W' AND" string and clicking
"Update".
* Clicking "Limit 1000" in the Product Version list in the
resulting page (without this, the listing will be restricted to
the releases in which the signature is most common, which will
often not include the canary/dev release being investigated).
* Choose some subset of that list, or all of it, to include in the
bug. Make sure to indicate if there is a defined point in the
past before which the signature is not present.
Identifying unlabeled network bugs on the tracker:
* Look at new uncomfirmed bugs since noon PST on the last triager's rotation:
https://code.google.com/p/chromium/issues/list?can=2&q=status%3Aunconfirmed&sort=-id&num=1000
* Press "h" to bring up a preview of the bug text.
* Use "j" and "k" to advance through bugs.
* If a bug looks like it might be network/download/safe-browsing related, middle
click [or command-click on OSX] to open in new tab.
* If a user provides a crash ID for a crasher for a bug that could be
net-related, look at the crash stack at go/crash, and see if it looks to be
network related. Be sure to check if other bug reports have that stack
trace, and mark as a dupe if so. Even if the bug isn't network related,
paste the stack trace in the bug, so no one else has to look up the crash
stack from the ID.
* If there's no other information than the crash ID, ask for more details and
add the Needs-Feedback label.
* If network causes are possible, ask for a net-internals log (If it's not a
browser crash) and attach the most specific internals-network label that's
applicable. If there isn't an applicable narrower label, a clear owner for
the issue, or there are multiple possibilities, attach the internals-network
label and proceed with further investigation.
* If non-network causes also seem possible, attach those labels as well.
Investigating Cr-Internals-Network bugs:
* It's recommended that while on triage duty, you subscribe to the
Cr-Internals-Network label. To do this, go to
https://code.google.com/p/chromium/issues/ and click on "Subscriptions".
Enter Cr-Internals-Network and click submit.
* Look through uncomfirmed and untriaged Cr-Internals-Network bugs, prioritizing
those updated within the last week:
https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3DInternals-Network+-status%3AAssigned+-status%3AStarted+-status%3AAvailable+&sort=-modified
* If more information is needed from the reporter, ask for it and
add the 'Needs-Feedback' label. If the reporter has answered an
earlier request for information, remove that label.
* While investigating a new issue, change the status to Untriaged.
* If a bug is a potential security issue (Allows for code execution from remote
site, allows crossing security boundaries, unchecked array bounds, etc) mark
it Type-Bug-Security. If it has privacy implication (History, cookies
discoverable by an entity that shouldn't be able to do so, incognito state
being saved in memory or on disk beyond the lifetime of incognito tabs,
etc), mark it Cr-Privacy.
* For bugs that already have a more specific network label, go ahead and remove
the Cr-Internals-Network label and move on.
* Try to figure out if it's really a network bug. See common non-network labels
section for description of common labels needed for issues incorrectly
tagged as Cr-Internals-Network.
* If it's not, attach appropriate labels and go no further.
* If it may be a network bug, attach additional possibly relevant labels if any,
and continue investigating. Once you either determine it's a non-network
bug, or figure out accurate more specific network labels, your job is done,
though you should still ask for a net-internals dump if it seems likely to
be useful.
* Note that ChromeOS-specific network-related code (Captive portal detection,
connectivity detection, login, etc) may not all have appropriate more
specific labels, but are not in areas handled by the network stack team.
Just make sure those have the OS-Chrome label, and any more specific labels
if applicable, and then move on.
* Gather data and investigate.
* Remember to add the Needs-Feedback label whenever waiting for the user to
respond with more information, and remove it when not waiting on the user.
* Try to reproduce locally. If you can, and it's a regression, use
src/tools/bisect-builds.py to figure out when it regressed.
* Ask more data from the user as needed (net-internals dumps, repro case,
crash ID from about:crashes, run tests, etc).
* If asking for an about:net-internals dump, provide this link:
https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details.
Can just grab the link from about:net-internals, as needed.
* Try to figure out what's going on, and which more specific network label is
most appropriate.
* If it's a regression, browse through the git history of relevant files to try
and figure out when it regressed. CC authors / primary reviewers of any
strongly suspect CLs.
* If you are having trouble with an issue, particularly for help understanding
net-internals logs, email the public net-dev@chromium.org list for help
debugging. If it's a crasher, or for some other reason discussion needs to
be done in private, use chrome-network-debugging@google.com.
TODO(mmenke): Write up a net-internals tips and tricks docs.
* If it appears to be a bug in the unowned core of the network stack (i.e. no
sublabel applies, or only the Cr-Internals-Network-HTTP sublabel applies,
and there's no clear owner), try to figure out the exact cause.
Monitor UMA histograms and gasper alerts. For each Gasper alert that
fires, determine if it's a real alert and file a bug if so.
* Don't file if the alert is coincident with a major volume change.
The volume at a particular date can be determined by hovering the
mouse over the appropriate location on the alert line.
* Don't file if the alert is on a graph with very low volume (< ~200
data points); it's probably noise, and we probably don't care even
if it isn't.
* Don't file if the graph is really noisy (but eyeball it to decide if
there is an underlying important shift under the noise).
* Don't file if the alert is in the "Known Ignorable" list:
* SimpleCache on Windows
* DiskCache on Android.
For each Gasper alert, respond to chrome-network-debugging@ with a
summary of the action you've taken and why, including issue link if an
issue was filed.
Investigating crashers:
* Only investigate crashers that are still occurring, as identified by above
section. If a search on go/crash indicates a crasher is no longer
occurring, mark it as WontFix.
* Particularly for Windows, look for weird dlls associated with the crashes.
If there are some, it may be caused by malware. You can often figure out if
a dll is malware by a search, though it's harder to figure out if a dll is
definitively not malware.
* See if the same users are repeatedly running into the same issue. This can be
accomplished by search for (Or clicking on) the client ID associated with a
crash report, and seeing if there are multiple reports for the same crash.
If this is the case, it may be also be malware, or an issue with an unusual
system/chrome/network config.
* Dig through crash reports to figure out when the crash first appeared, and dig
through revision history in related files to try and locate a suspect CL.
TODO(mmenke): Add more detail here.
* Load crash dumps, try to figure out a cause.
See http://www.chromium.org/developers/crash-reports for more information
Dealing with old bugs:
* For all network issues (Even those with owners, or a more specific labels):
* If the issue has had the Needs-Feedback label for over a month, verify it
is waiting on feedback from the user. If not, remove the label.
Otherwise, go ahead and mark the issue WontFix due to lack of response and
suggest the user file a new bug if the issue is still present.
Old Needs-Feedback issues: https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3AInternals-Network%20Needs=Feedback+modified-before%3Atoday-30&sort=-modified
* If a bug is over 2 months old, and the underlying problem was never
reproduced or really understood:
* If it's over a year old, go ahead and mark the issue as Archived.
* Otherwise, ask reporters if the issue is still present, and attach the
Needs-Feedback label.
* Old unconfirmed or untriaged Cr-Internals-Network issues can be investigated
just like newer ones. Crashers should generally be given higher priority,
since we can verify if they still occur, and then newer issues, as they're
more likely to still be present, and more likely to have a still responsive
bug reporter.
# Chrome Network Bug Triage
The Chrome network team uses a two day bug triage rotation. The main goals are The Chrome network team uses a two day bug triage rotation. The main goals are
to identify and label new network bugs, and investigate network bugs when no to identify and label new network bugs, and investigate network bugs when no
label seems suitable. label seems suitable.
Responsibilities ## Responsibilities
Required: ### Required:
* Identify new crashers * Identify new crashers
* Identify new network issues. * Identify new network issues.
* Request data about recent Cr-Internals-Network issue. * Request data about recent Cr-Internals-Network issue.
* Investigate each recent Cr-Internals-Network issue. * Investigate each recent Cr-Internals-Network issue.
* Monitor UMA histograms and gasper alerts. * Monitor UMA histograms and gasper alerts.
Best effort: ### Best effort:
* Investigate unowned and owned-but-forgotten net/ crashers * Investigate unowned and owned-but-forgotten net/ crashers
* Investigate old bugs * Investigate old bugs
* Close obsolete bugs. * Close obsolete bugs.
All of the above is to be done on each rotation. These All of the above is to be done on each rotation. These responsibilities should
responsibilities should be tracked, and anything left undone at the be tracked, and anything left undone at the end of a rotation should be handed
end of a rotation should be handed off to the next triager. The off to the next triager. The downside to passing along bug investigations like
downside to passing along bug investigations like this is each new this is each new triager has to get back up to speed on bugs the previous
triager has to get back up to speed on bugs the previous triager was triager was investigating. The upside is that triagers don't get stuck
investigating. The upside is that triagers don't get stuck investigating issues after their time after their rotation, and it results in a
investigating issues after their time after their rotation, and it uniform, predictable two day commitment for all triagers.
results in a uniform, predictable two day commitment for all triagers.
## Details
More detail: ### Required:
Required activities:
* Identify new crashers that are potentially network related. You should check * Identify new crashers that are potentially network related. You should check
the most recent canary, the previous canary (if the most recent less than a the most recent canary, the previous canary (if the most recent less than a
day old), and any of dev/beta/stable that were released in the last couple day old), and any of dev/beta/stable that were released in the last couple of
of days, for each platform. File Cr-Internals-Network bugs on the tracker days, for each platform. File Cr-Internals-Network bugs on the tracker when
when new crashers are found. new crashers are found.
* Identify new network bugs, both on the bug tracker and on the crash server. * Identify new network bugs, both on the bug tracker and on the crash server.
All Unconfirmed issues filed during your triage rotation should be scanned, All Unconfirmed issues filed during your triage rotation should be scanned,
and, for suspected network bugs, a network label assigned. A triager is and, for suspected network bugs, a network label assigned. A triager is
responsible for looking at bugs reported from noon PST / 3:00 pm EST of the responsible for looking at bugs reported from noon PST / 3:00 pm EST of the
last day of the previous triager's rotation until the same time on the last last day of the previous triager's rotation until the same time on the last
day of their rotation. day of their rotation.
* Investigate each recent (New comment within the past week or so)
* Investigate each recent (new comment within the past week or so)
Cr-Internals-Network issue, driving getting information from reporters as Cr-Internals-Network issue, driving getting information from reporters as
needed, until you can do one of the following: needed, until you can do one of the following:
* Mark it as WontFix (working as intended, obsolete issue) or a duplicate.
* Mark it as a feature request. * Mark it as *WontFix* (working as intended, obsolete issue) or a
* Remove the Cr-Internals-Network label, replacing it with at least one more duplicate.
specific network label or non-network label. Promptly adding non-network
labels when appropriate is important to get new bugs in front of someone * Mark it as a feature request.
familiar with the relevant code, and to remove them from the next triager's
radar. Because of the way the bug report wizard works, a lot of bugs * Remove the Cr-Internals-Network label, replacing it with at least one
incorrectly end up with the network label. more specific network label or non-network label. Promptly adding
* The issue is assigned to an appropriate owner. non-network labels when appropriate is important to get new bugs in front
* If there is no more specific label for a bug, it should be investigated of someone familiar with the relevant code, and to remove them from the
until we have a good understanding of the cause of the problem, and some next triager's radar. Because of the way the bug report wizard works, a
idea how it should be fixed, at which point its status should be set to lot of bugs incorrectly end up with the network label.
Available. Future triagers should ignore bugs with this status, unless
investigating stale bugs. * The issue is assigned to an appropriate owner.
* If there is no more specific label for a bug, it should be investigated
until we have a good understanding of the cause of the problem, and some
idea how it should be fixed, at which point its status should be set to
Available. Future triagers should ignore bugs with this status, unless
investigating stale bugs.
* Monitor UMA histograms and gasper alerts. * Monitor UMA histograms and gasper alerts.
* For each Gasper alert that fires, the triager should determine if
the alert is real (not due to noise), and file a bug with the
appropriate label if so. Note that if no label more specific than
Cr-Internals-Network is appropriate, the responsibility remains
with the triager to continue investigating the bug, as above.
Best Effort (As you have time): * For each Gasper alert that fires, the triager should determine if the
alert is real (not due to noise), and file a bug with the appropriate
label if so. Note that if no label more specific than
Cr-Internals-Network is appropriate, the responsibility remains with the
triager to continue investigating the bug, as above.
### Best Effort (As you have time):
* Investigate unowned and owned but forgotten net/ crashers that are still * Investigate unowned and owned but forgotten net/ crashers that are still
occurring (As indicated by go/chromecrash), prioritizing frequent and long occurring (As indicated by
standing crashers. [go/chromecrash](https://goto.google.com/chromecrash)), prioritizing frequent
and long standing crashers.
* Investigate old bugs, prioritizing the most recent. * Investigate old bugs, prioritizing the most recent.
* Close obsolete bugs. * Close obsolete bugs.
If you've investigated an issue (in code you don't normally work on) to an If you've investigated an issue (in code you don't normally work on) to an
...@@ -75,5 +91,8 @@ extent that you know how to fix it, and the fix is simple, feel free to take ...@@ -75,5 +91,8 @@ extent that you know how to fix it, and the fix is simple, feel free to take
ownership of the issue and create a patch while on triage duty, but other tasks ownership of the issue and create a patch while on triage duty, but other tasks
should take priority. should take priority.
See bug-triage-suggested-workflow.txt for suggested workflows. See [bug-triage-suggested-workflow.md](bug-triage-suggested-workflow.md) for
See bug-triage-labels.txt for labeling tips for network and non-network bugs. suggested workflows.
See [bug-triage-labels.md](bug-triage-labels.md) for labeling tips for network
and non-network bugs.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment