Commit e0525355 authored by Erik Chen's avatar Erik Chen Committed by Commit Bot

Add documentation for debugging memory issues and additional terminology.

Bug: 801006
Change-Id: Ic75538c44814c912ab623af4959f74cd081000f7
Reviewed-on: https://chromium-review.googlesource.com/939668
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: default avatarPrimiano Tucci <primiano@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540529}
parent 869faf0b
...@@ -7,48 +7,38 @@ click of a button you can understand where memory is being used in your system. ...@@ -7,48 +7,38 @@ click of a button you can understand where memory is being used in your system.
[TOC] [TOC]
## Getting Started ## Taking a memory-infra trace
1. Get a bleeding-edge or tip-of-tree build of Chrome. 1. [Record a trace as usual][record-trace]: open [chrome://tracing][tracing]
2. [Record a trace as usual][record-trace]: open [chrome://tracing][tracing]
on Desktop Chrome or [chrome://inspect?tracing][inspect-tracing] to trace on Desktop Chrome or [chrome://inspect?tracing][inspect-tracing] to trace
Chrome for Android. Chrome for Android.
3. Make sure to enable the **memory-infra** category on the right. 2. Make sure to enable the **memory-infra** category on the right.
![Tick the memory-infra checkbox when recording a trace.][memory-infra-box] ![Tick the memory-infra checkbox when recording a trace.][memory-infra-box]
4. For now, some subsystems only work if Chrome is started with the
`--no-sandbox` flag.
<!-- TODO(primiano) TODO(ssid): https://crbug.com/461788 -->
[record-trace]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs [record-trace]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs
[tracing]: chrome://tracing [tracing]: chrome://tracing
[inspect-tracing]: chrome://inspect?tracing [inspect-tracing]: chrome://inspect?tracing
[memory-infra-box]: https://storage.googleapis.com/chromium-docs.appspot.com/1c6d1886584e7cc6ffed0d377f32023f8da53e02 [memory-infra-box]: https://storage.googleapis.com/chromium-docs.appspot.com/1c6d1886584e7cc6ffed0d377f32023f8da53e02
![Timeline View and Analysis View][tracing-views] ## Navigating a memory-infra trace
After recording a trace, you will see the **timeline view**. Timeline view
shows:
* Total resident memory grouped by process (at the top). ![Timeline View and Analysis View][tracing-views]
* Total resident memory grouped by subsystem (at the top).
* Allocated memory per subsystem for every process.
Click one of the ![M][m-blue] dots to bring up the **analysis view**. Click After recording a trace, you will see the **timeline view**. The **timeline
on a cell in analysis view to reveal more information about its subsystem. view** is primarily used for other tracing features. Click one of the
PartitionAlloc for instance, has more details about its partitions. ![M][m-purple] dots to bring up the **analysis view**. Click on a cell in
analysis view to reveal more information about its subsystem. PartitionAlloc for
instance, has more details about its partitions.
![Component details for PartitionAlloc][partalloc-details] ![Component details for PartitionAlloc][partalloc-details]
The purple ![M][m-purple] dots represent heavy dumps. In these dumps, components The full details of the MemoryInfra UI are explained in its [design
can provide more details than in the regular dumps. The full details of the doc][mi-ui-doc].
MemoryInfra UI are explained in its [design doc][mi-ui-doc].
[tracing-views]: https://storage.googleapis.com/chromium-docs.appspot.com/db12015bd262385f0f8bd69133330978a99da1ca [tracing-views]: https://storage.googleapis.com/chromium-docs.appspot.com/db12015bd262385f0f8bd69133330978a99da1ca
[m-blue]: https://storage.googleapis.com/chromium-docs.appspot.com/b60f342e38ff3a3767bbe4c8640d96a2d8bc864b
[partalloc-details]: https://storage.googleapis.com/chromium-docs.appspot.com/02eade61d57c83f8ef8227965513456555fc3324 [partalloc-details]: https://storage.googleapis.com/chromium-docs.appspot.com/02eade61d57c83f8ef8227965513456555fc3324
[m-purple]: https://storage.googleapis.com/chromium-docs.appspot.com/d7bdf4d16204c293688be2e5a0bcb2bf463dbbc3 [m-purple]: https://storage.googleapis.com/chromium-docs.appspot.com/d7bdf4d16204c293688be2e5a0bcb2bf463dbbc3
[mi-ui-doc]: https://docs.google.com/document/d/1b5BSBEd1oB-3zj_CBAQWiQZ0cmI0HmjmXG-5iNveLqw/edit [mi-ui-doc]: https://docs.google.com/document/d/1b5BSBEd1oB-3zj_CBAQWiQZ0cmI0HmjmXG-5iNveLqw/edit
......
...@@ -39,17 +39,11 @@ instead. ...@@ -39,17 +39,11 @@ instead.
Follow [these instructions](/docs/memory/filing_memory_bugs.md) to file a high Follow [these instructions](/docs/memory/filing_memory_bugs.md) to file a high
quality bug. quality bug.
## I have a reproducible memory problem, what do I do? ## I'm a developer trying to investigate a memory issues, what do I do?
Yay! Please file a [memory See [this page](/docs/memory/debugging_memory_issues.md) for further instructions.
bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Memory%20usage).
If you are willing to do a bit more, please grab a memory infra trace and upload ## I'm a developer looking for more information. How do I get started?
that. Here are [instructions for MacOS](https://docs.google.com/document/d/15mBOu_uZbgP5bpdHZJXEnF9csSRq7phUWXnZcteVr0o/edit).
(TODO: Add instructions for easily grabbing a trace for all platforms.)
## I'm a dev and I want to help. How do I get started?
Great! First, sign up for the mailing lists above and check out the slack channel. Great! First, sign up for the mailing lists above and check out the slack channel.
...@@ -60,7 +54,6 @@ Second, familiarize yourself with the following: ...@@ -60,7 +54,6 @@ Second, familiarize yourself with the following:
| [Key Concepts in Chrome Memory](/docs/memory/key_concepts.md) | Primer for memory terminology in Chrome. | | [Key Concepts in Chrome Memory](/docs/memory/key_concepts.md) | Primer for memory terminology in Chrome. |
| [memory-infra](/docs/memory-infra/README.md) | The primary tool used for inspecting allocations. | | [memory-infra](/docs/memory-infra/README.md) | The primary tool used for inspecting allocations. |
## What are people actively working on? ## What are people actively working on?
| Project | Description | | Project | Description |
|---------|-------------| |---------|-------------|
......
# Debugging Memory Issues
This page is designed to help Chromium developers debug memory issues.
When in doubt, reach out to memory-dev@chromium.org.
[TOC]
## Investigating Reproducible Memory Issues
Let's say that there's a CL or feature that reproducibly increases memory usage
when it's landed/enabled, given a particular set of repro steps.
* Take a look at [the documentation](/docs/memory/README.md) for both
taking and navigating memory-infra traces.
* Take two memory-infra traces. One with the reproducible memory regression, and
one without.
* Load the memory-infra traces into two tabs.
* Compare the memory dump providers and look for the one that shows the
regression. Follow the relevant link.
* [The regression is in the Malloc MemoryDumpProvider.](#Investigating-Reproducible-Memory-Issues)
* [The regression is in a non-Malloc
MemoryDumpProvider.](#Regression-in-Non-Malloc-MemoryDumpProvider)
* [The regression is only observed in **private
footprint**.](#Regression-only-in-Private-Footprint)
* [No regression is observed.](#No-observed-regression)
### Regression in Malloc MemoryDumpProvider
Repeat the above steps, but this time also [take a heap
dump](#Taking-a-Heap-Dump). Confirm that the regression is also visible in the
heap dump, and then compare the two heap dumps to find the difference. You can
also use
[diff_heap_profiler.py](https://cs.chromium.org/chromium/src/third_party/catapult/experimental/tracing/bin/diff_heap_profiler.py)
to perform the diff.
### Regression in Non-Malloc MemoryDumpProvider
Hopefully the MemoryDumpProvider has sufficient information to help diagnose the
leak. Depending on the whether the leaked object is allocated via malloc or new
- it usually should be, you can also use the steps for debugging a Malloc
MemoryDumpProvider regression.
### Regression only in Private Footprint
* Repeat the repro steps, but instead of taking a memory-infra trace, use
the following tools to map the process's virtual space:
* On macOS, use vmmap
* On Windows, use SysInternal VMMap
* On other OSes, use /proc/<pid\>/smaps.
* The results should help diagnose what's happening. Contact the
memory-dev@chromium.org mailing list for more help.
### No observed regression
* If there isn't a regression in PrivateMemoryFootprint, then this might become
a question of semantics for what constitutes a memory regression. Common
problems include:
* Shared Memory, which is hard to attribute, but is mostly accounted for in
the memory-infra trace.
* Binary size, which is currently not accounted for anywhere.
## Investigating Heap Dumps From the Wild
For a small set of Chrome users in the wild, Chrome will record and upload
anonymized heap dumps. This has the benefit of wider coverage for real code
paths, at the expense of reproducibility.
These heap dumps can take some time to grok, but frequently yield valuable
insight. At the time of this writing, heap dumps from the wild have resulted in
real, high impact bugs being found in Chrome code ~90% of the time.
* The first thing to do upon receiving a heap dump is to open it in the [trace
viewer](/docs/memory-infra/README.md). This will tell us the counts, sizes, and
allocating stack traces of the potentially leaked objects. Look for stacks
that result in >100 MB of live memory. Frequently, sets of objects will be
leaked with similar counts. This can provide insight into the nature of the
leak.
* Important note: Heap profiling in the field uses
[poison process sampling](https://bugs.chromium.org/p/chromium/issues/detail?id=810748)
with a rate parameter of 10000. This means that for large/frequent allocations
[e.g. >100 MB], the noise will be quite small [much less than 1%]. But
there is noise so counts will not be exact.
* The stack trace is almost always sufficient to tell the type of object being
leaked as well, since most functions in Chrome have a limited number of calls
to new and malloc.
* The next thing to do is to determine whether the memory usage is intentional.
Very rarely, components in Chrome legitimately need to use many 100s of MBs of
memory. In this case, it's important to create a
[MemoryDumpProvider](https://cs.chromium.org/chromium/src/base/trace_event/memory_dump_provider.h)
to report this memory usage, so that we have a better understanding of which
components are using a lot of memory. For an example, see
[Issue 813046](https://bugs.chromium.org/p/chromium/issues/detail?id=813046).
* Assuming the memory usage is not intentional, the next thing to do is to
figure out what is causing the memory leak.
* The most common cause is adding elements to a container with no limit.
Usually the code makes assumptions about how frequently it will be called
in the wild, and something breaks those assumptions. Or sometimes the code
to clear the container is not called as frequently as expected [or at
all]. [Example
1](https://bugs.chromium.org/p/chromium/issues/detail?id=798012). [Example
2](https://bugs.chromium.org/p/chromium/issues/detail?id=804440).
* Retain cycles for ref-counted objects.
[Example](https://bugs.chromium.org/p/chromium/issues/detail?id=814334#c23)
* Straight up leaks resulting from incorrect use of APIs. [Example
1](https://bugs.chromium.org/p/chromium/issues/detail?id=801702#c31).
[Example
2](https://bugs.chromium.org/p/chromium/issues/detail?id=814444#c17).
## Taking a Heap Dump
Navigate to chrome://flags and search for **memlog**. There are several options
that can be used to configure heap dumps. All of these options are also
available as command line flags, for automated test runs [e.g. telemetry].
* `#memlog` controls which processes are profiled. It's also possible to
manually specify the process via the interface at `chrome://memory-internals`.
* `#memlog-sampling` will greatly reduce the overhead of the heap profiler, at
the expense of inaccuracy in small or infrequent allocations. Unless
performance is a concern, leave it disabled.
* `#memlog-stack-mode` describes the type of metadata recorded for each
allocation. `native` stacks provide the most utility. The only time the other
options should be considered is for Android official builds, most of which do
not support `native` stacks.
* `#memlog-keep-small-allocations` should be enabled, as it prevents the heap
dump exporter from pruning small allocations. Doing so yields smaller traces,
which is desirable when heap profiling is enabled in the wild.
Once the flags have been set appropriately, restart Chrome and take a
memory-infra trace. The results will have a heap dump.
...@@ -89,16 +89,80 @@ systems. ...@@ -89,16 +89,80 @@ systems.
## Terms and definitions ## Terms and definitions
TODO(awong): To through Erik's Consistent Memory Metrics doc and pull out bits Each platform exposes a different memory model. This section describes a
that reconcile with this. consistent set of terminology that will be used by this document. This
terminology is intentionally Linux-biased, since that is the platform most
### Commited Memory readers are expected to be familiar with.
### Discardable memory
### Proportional Set Size ### Supported platforms
### Image memory * Linux
### Shared Memory. * Android
* ChromeOS
TODO(awong): Write overview of our platform diversity, windows vs \*nix memory models (eg, * Windows [kernel: Windows NT]
"committed" memory), what "discardable" memory is, GPU memory, zram, overcommit, * macOS/iOS [kernel: Darwin/XNU/Mach]
the various Chrome heaps (pageheap, partitionalloc, oilpan, v8, malloc...per
platform), etc. ### Terminology
Warning: This terminology is neither complete, nor precise, when compared to the
terminology used by any specific platform. Any in-depth discussion should occur
on a per-platform basis, and use terminology specific to that platform.
* **Virtual memory** - A per-process abstraction layer exposed by the kernel. A
contiguous region divided into 4kb **virtual pages**.
* **Physical memory** - A per-machine abstraction layer internal to the kernel.
A contiguous region divided into 4kb **physical pages**. Each **physical
page** represents 4kb of physical memory.
* **Resident** - A virtual page whose contents is backed by a physical
page.
* **Swapped/Compressed** - A virtual page whose contents is backed by
something other than a physical page.
* **Swapping/Compression** - [verb] The process of taking Resident pages and
making them Swapped/Compressed pages. This frees up physical pages.
* **Unlocked Discardable/Reusable** - Android [Ashmem] and Darwin specific. A virtual
page whose contents is backed by a physical page, but the Kernel is free
to reuse the physical page at any point in time.
* **Private** - A virtual page whose contents will only be modifiable by the
current process.
* **Copy on Write** - A private virtual page owned by the parent process.
When either the parent or child process attempts to make a modification, the
child is given a private copy of the page.
* **Shared** - A virtual page whose contents could be shared with other
processes.
* **File-backed** - A virtual page whose contents reflect those of a
file.
* **Anonymous** - A virtual page that is not file-backed.
## Platform Specific Sources of Truth
Memory is a complex topic, fraught with potential miscommunications. In an
attempt to forestall disagreement over semantics, these are the sources of truth
used to determine memory usage for a given process.
* Windows: [SysInternals
VMMap](https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap)
* Darwin:
[vmmap](https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/vmmap.1.html)
* Linux/Derivatives:
[/proc/<pid\>/smaps](http://man7.org/linux/man-pages/man5/proc.5.html)
## Shared Memory
Accounting for shared memory is poorly defined. If a memory region is mapped
into multiple processes [possibly multiple times], which ones should it count
towards?
On Linux, one common solution is to use proportional set size, which counts
1/Nth of the resident size, where N is the number of other processes that have
page faulted the region. This has the nice property of being additive across
processes. The downside is that it is context dependent. e.g. If a user opens
more tabs, thus causing a system library to be mapped into more processes, the
PSS for previous tabs will go down.
File backed shared memory regions are typically not interesting to report, since
they typically represent shared system resources, libraries, and the browser
binary itself, all of which are outside of the control of developers. This is
particularly problematic across different versions of the OS, where the set of
base libraries that get linked by default into a process highly varies, out of
Chrome's control.
In Chrome, we have implemented ownership tracking for anonymous shared memory
regions - each shared memory region counts towards exactly one process, which is
determined by the type and usage of the shared memory region.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment