Commit 527badd1 authored by Erik Chen's avatar Erik Chen Committed by Commit Bot

Add documentation for investigating an example heap dump.

Change-Id: I9f35828ea104b0c0901a65845796a96bc28cbbe3
Reviewed-on: https://chromium-review.googlesource.com/1017766
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: default avatarAlbert J. Wong <ajwong@chromium.org>
Cr-Commit-Position: refs/heads/master@{#551833}
parent 3d9b5b94
......@@ -70,20 +70,23 @@ These heap dumps can take some time to grok, but frequently yield valuable
insight. At the time of this writing, heap dumps from the wild have resulted in
real, high impact bugs being found in Chrome code ~90% of the time.
* The first thing to do upon receiving a heap dump is to open it in the [trace
viewer](/docs/memory-infra/heap_profiler.md#how-to-manually-browse-a-heap-dump).
This will tell us the counts, sizes, and allocating stack traces of the
potentially leaked objects. Look for stacks that result in >100 MB of live
memory. Frequently, sets of objects will be leaked with similar counts. This
can provide insight into the nature of the leak.
For an example investigation of a real heap dump, see [this
link](/docs/memory/investigating_heap_dump_example.md).
* Raw heap dumps can be viewed in the trace viewer. [See detailed
instructions.](/docs/memory-infra/heap_profiler.md#how-to-manually-browse-a-heap-dump).
This interface surfaces all available information, but can be overwhelming and
is usually unnecessary for investigating heap dumps.
* Important note: Heap profiling in the field uses
[poison process sampling](https://bugs.chromium.org/p/chromium/issues/detail?id=810748)
with a rate parameter of 10000. This means that for large/frequent allocations
[e.g. >100 MB], the noise will be quite small [much less than 1%]. But
there is noise so counts will not be exact.
* The stack trace is almost always sufficient to tell the type of object being
leaked as well, since most functions in Chrome have a limited number of calls
to new and malloc.
* The heap dump summary typically contains all information necessary to diagnose
a memory issue.
* The stack trace of the potential memory leak is almost always sufficient to
tell the type of object being leaked, since most functions in Chrome
have a limited number of calls to new and malloc.
* The next thing to do is to determine whether the memory usage is intentional.
Very rarely, components in Chrome legitimately need to use many 100s of MBs of
memory. In this case, it's important to create a
......
# Example Investigation of a Heap Dump
This document describes the steps taken to investigate a real memory leak
discovered by heap profiling in the wild. For investigators less familiar with
the code base, `Navigating the Stack Trace` should be enough information to
determine the relevant component, and to forward the bug to a component OWNER.
## Understanding the heap dump summary
The opening comment of [Issue
834033](https://bugs.chromium.org/p/chromium/issues/detail?id=834033) contains a
heap dump summary. The highlights are:
* 315723 calls to malloc without corresponding call to free.
* 806MB of memory.
* The common stacktrace for all 315723 allocations.
Usually, anything that uses over 10MB of memory is a red flag. With the
exception of large image resources, most code in Chrome should use much less
than 10MB. Anything that has over 100k allocations is also a red flag.
### Navigating the Stack Trace - Detailed Breakdown
Let's take a look at the stack trace:
```
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
<???>
<???>
base::allocator::UncheckedMallocMac(unsigned long, void**)
sk_malloc_flags(unsigned long, unsigned int)
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
extensions::ExtensionActionFunction::Run()
ExtensionFunction::RunWithValidation()
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpCFRunLoopBase::RunWork()
base::mac::CallWithEHFrame(void () block_pointer)
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
<???>
<???>
<???>
<???>
<???>
<???>
<???>
<???>
<???>
__71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke
base::mac::CallWithEHFrame(void () block_pointer)
-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
<???>
base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*)
base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*)
<name omitted>
ChromeBrowserMainParts::MainMessageLoopRun(int*)
content::BrowserMainLoop::RunMainMessageLoopParts()
content::BrowserMainRunnerImpl::Run()
content::BrowserMain(content::MainFunctionParams const&)
content::ContentMainRunnerImpl::Run()
service_manager::Main(service_manager::MainParams const&)
content::ContentMain(content::ContentMainParams const&)
ChromeMain
main
<???>
```
The first step is to divide the stack trace into smaller segments to get a
better understanding of what's happening at the time of allocations. The best
way to do this is to segment by name space and/or function prefixes.
```
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
<???>
<???>
base::allocator::UncheckedMallocMac(unsigned long, void**)
```
The top of each stack will always contain some `base` and/or `profiling`
code. This is the code responsible for allocating and recording the memory.
```
sk_malloc_flags(unsigned long, unsigned int)
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
```
Next, we three 3 frames with the prefix `sk`. Searching for
`sk_malloc_flags` on
[codesearch](https://cs.chromium.org/search/?q=sk_malloc_flags&sq=package:chromium&type=cs)
reveals that the component is `third_party/skia`. Looking at the
[README](https://cs.chromium.org/chromium/src/third_party/skia/README) reveals
that Skia is a 2D graphics library.
```
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
```
Next we see a templated function called `Read` in the namespace `IPC`.
`IPC` stands for inter-process communication. This suggests that the
function is responsible for reading an IPC Message, perhaps concerning an
`SkBitmap`.
```
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
extensions::ExtensionActionFunction::Run()
ExtensionFunction::RunWithValidation()
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
```
Next, we see many frames with the `extension` prefix. Extensions are exactly
what they sound like - Chrome extensions like AdBlock are used to modify the
behavior of the browser.
```
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
```
`content` is the name of code that glues together web code [like extensions] and
the rest of Chrome.
```
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
```
More `IPC` code.
```
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpCFRunLoopBase::RunWork()
base::mac::CallWithEHFrame(void () block_pointer)
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
```
More `base` code. The bottom of most stack traces should go back to
`MessageLoop`, a primitive Chrome construct used to run tasks.
### Navigating the Stack Trace - Summary
* The top and bottom of the stack should generally be the same and are not very
interesting.
* The prefixes of frames can be used to get a rough idea of the components
involved.
* Function names can be used to get a rough idea of what's going on.
In this case, extension code is calling `ParseIconFromCanvasDictionary` - so
it's probably trying to parse an icon. This calls into Skia code. Given that
Skia is a 2D drawing library, and the function is `tryAllocPixels`, Skia is
allocating some pixels for the icon. This process is being repeated 315 thousand
times, and the icon is being leaked every time.
## Diving into the code
Now that we have a rough idea of what's happening, let's look at the code for
ParseIconFromCanvasDictionary.
```cpp
bool ExtensionAction::ParseIconFromCanvasDictionary(
const base::DictionaryValue& dict,
gfx::ImageSkia* icon) {
for (base::DictionaryValue::Iterator iter(dict); !iter.IsAtEnd();
iter.Advance()) {
std::string binary_string64;
IPC::Message pickle;
if (iter.value().is_blob()) {
pickle = IPC::Message(iter.value().GetBlob().data(),
iter.value().GetBlob().size());
} else if (iter.value().GetAsString(&binary_string64)) {
std::string binary_string;
if (!base::Base64Decode(binary_string64, &binary_string))
return false;
pickle = IPC::Message(binary_string.c_str(), binary_string.length());
} else {
continue;
}
base::PickleIterator pickle_iter(pickle);
SkBitmap bitmap;
if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
return false;
CHECK(!bitmap.isNull());
// Chrome helpfully scales the provided icon(s), but let's not go overboard.
const int kActionIconMaxSize = 10 * ActionIconSize();
if (bitmap.drawsNothing() || bitmap.width() > kActionIconMaxSize)
continue;
float scale = static_cast<float>(bitmap.width()) / ActionIconSize();
icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
}
return true;
}
```
There's a lot going on here, but we can use the information we have to focus.
The leak happens in IPC::ReadParam, so the relevant lines are:
```
SkBitmap bitmap;
if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
return false;
```
The `IPC` message is being decoded into `bitmap`.
```
icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
```
Looking at subsequent consumers of `bitmap`, we see that it is being added as a
representation to `icon`. `icon` is an output parameter of this function, so we
have to look at the calling frame,
`ExtensionActionSetIconFunction::RunExtensionAction`.
```
ExtensionFunction::ResponseAction
ExtensionActionSetIconFunction::RunExtensionAction() {
...
EXTENSION_FUNCTION_VALIDATE(
ExtensionAction::ParseIconFromCanvasDictionary(*canvas_set, &icon));
if (icon.isNull())
return RespondNow(Error("Icon invalid."));
extension_action_->SetIcon(tab_id_, gfx::Image(icon));
...
}
```
In this case, I've already focused on the code that calls
`ParseIconFromCanvasDictionary`. Let's look at `SetIcon`.
```
void ExtensionAction::SetIcon(int tab_id, const gfx::Image& image) {
SetValue(&icon_, tab_id, image);
}
```
```
template<class T>
void SetValue(std::map<int, T>* map, int tab_id, const T& val) {
(*map)[tab_id] = val;
}
```
This adds the image to a map, with `tab_id` as the key. Next, we use codesearch
to look at all places that touch `icon_`. The only place that performs removal
is
```
void ExtensionAction::ClearAllValuesForTab(int tab_id) {
...
icon_.erase(tab_id);
...
}
```
This is called by `ExtensionActionAPI::ClearAllValuesForTab`, which is called by
`TabHelper::DidFinishNavigation`. Now we've reached the root of the problem.
The icon is set for [potentially] all tabs, but is only removed when a tab
"finishes navigation". If instead a tab is closed, then the icon is leaked
forever.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment