Commit 7c1d03f5 authored by erikchen's avatar erikchen Committed by Commit Bot

Correct macOS private memory footprint calculations.

On macOS, measurements for private memory footprint overcount by
faulted pages in anonymous shared memory. To discount for this, this CL touches
all the resident pages in anonymous shared memory, thus making them
faulted as well. This relies on two assumptions:

1) Consumers use shared memory from front to back. Thus, if there are
(N) resident pages, those pages represent the first N * PAGE_SIZE bytes in
the shared memory region.

2) The faulting logic is run shortly before the logic that calculates
phys_footprint, thus ensuring that the discrepancy between faulted and
resident pages is minimal.

The performance penalty is expected to be small.

* Most of the time, we expect the pages to already be resident and faulted,
thus incurring a cache penalty read hit [since we read from each resident
page].

* Rarely, we expect the pages to be resident but not faulted, resulting in
soft faults + cache penalty.

* If assumption (1) is invalid, this will potentially fault some
previously non-resident pages, thus increasing memory usage, without fixing
the accounting.

Bug: 812346
Change-Id: I04e91bc09bf6bdf2f9179dd8d678c92425e98fed
Reviewed-on: https://chromium-review.googlesource.com/973883Reviewed-by: default avatarPrimiano Tucci <primiano@chromium.org>
Reviewed-by: default avatarAlbert J. Wong <ajwong@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#545086}
parent 92aed294
...@@ -84,7 +84,7 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address, ...@@ -84,7 +84,7 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address,
DCHECK_EQ(0u, start_pointer % page_size); DCHECK_EQ(0u, start_pointer % page_size);
size_t offset = 0; size_t offset = 0;
size_t total_resident_size = 0; size_t total_resident_pages = 0;
bool failure = false; bool failure = false;
// An array as large as number of pages in memory segment needs to be passed // An array as large as number of pages in memory segment needs to be passed
...@@ -150,16 +150,16 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address, ...@@ -150,16 +150,16 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address,
if (failure) if (failure)
break; break;
total_resident_size += resident_page_count * page_size; total_resident_pages += resident_page_count * page_size;
offset += kMaxChunkSize; offset += kMaxChunkSize;
} }
DCHECK(!failure); DCHECK(!failure);
if (failure) { if (failure) {
total_resident_size = 0; total_resident_pages = 0;
LOG(ERROR) << "CountResidentBytes failed. The resident size is invalid"; LOG(ERROR) << "CountResidentBytes failed. The resident size is invalid";
} }
return total_resident_size; return total_resident_pages;
} }
// static // static
...@@ -180,9 +180,49 @@ base::Optional<size_t> ProcessMemoryDump::CountResidentBytesInSharedMemory( ...@@ -180,9 +180,49 @@ base::Optional<size_t> ProcessMemoryDump::CountResidentBytesInSharedMemory(
return base::Optional<size_t>(); return base::Optional<size_t>();
} }
size_t resident_size = size_t resident_pages =
info.private_pages_resident + info.shared_pages_resident; info.private_pages_resident + info.shared_pages_resident;
return resident_size * PAGE_SIZE;
// On macOS, measurements for private memory footprint overcount by
// faulted pages in anonymous shared memory. To discount for this, we touch
// all the resident pages in anonymous shared memory here, thus making them
// faulted as well. This relies on two assumptions:
//
// 1) Consumers use shared memory from front to back. Thus, if there are
// (N) resident pages, those pages represent the first N * PAGE_SIZE bytes in
// the shared memory region.
//
// 2) This logic is run shortly before the logic that calculates
// phys_footprint, thus ensuring that the discrepancy between faulted and
// resident pages is minimal.
//
// The performance penalty is expected to be small.
//
// * Most of the time, we expect the pages to already be resident and faulted,
// thus incurring a cache penalty read hit [since we read from each resident
// page].
//
// * Rarely, we expect the pages to be resident but not faulted, resulting in
// soft faults + cache penalty.
//
// * If assumption (1) is invalid, this will potentially fault some
// previously non-resident pages, thus increasing memory usage, without fixing
// the accounting.
//
// Sanity check in case the mapped size is less than the total size of the
// region.
size_t pages_to_fault =
std::min(resident_pages,
(shared_memory.mapped_size() + PAGE_SIZE - 1) / PAGE_SIZE);
volatile char* base_address = static_cast<char*>(shared_memory.memory());
for (size_t i = 0; i < pages_to_fault; ++i) {
// Reading from a volatile is a visible side-effect for the purposes of
// optimization. This guarantees that the optimizer will not kill this line.
base_address[i * PAGE_SIZE];
}
return resident_pages * PAGE_SIZE;
#else #else
return CountResidentBytes(shared_memory.memory(), return CountResidentBytes(shared_memory.memory(),
shared_memory.mapped_size()); shared_memory.mapped_size());
......
...@@ -254,6 +254,12 @@ void QueuedRequestDispatcher::SetUpAndDispatch( ...@@ -254,6 +254,12 @@ void QueuedRequestDispatcher::SetUpAndDispatch(
// Don't request a chrome memory dump at all if the client wants only the // Don't request a chrome memory dump at all if the client wants only the
// processes' vm regions, which are retrieved via RequestOSMemoryDump(). // processes' vm regions, which are retrieved via RequestOSMemoryDump().
//
// This must occur before the call to RequestOSMemoryDump, as
// ClientProcessImpl will [for macOS], delay the calculations for the
// OSMemoryDump until the Chrome memory dump is finished. See
// https://bugs.chromium.org/p/chromium/issues/detail?id=812346#c16 for more
// details.
if (request->wants_chrome_dumps()) { if (request->wants_chrome_dumps()) {
request->pending_responses.insert({client, ResponseType::kChromeDump}); request->pending_responses.insert({client, ResponseType::kChromeDump});
client->RequestChromeMemoryDump(request->GetRequestArgs(), client->RequestChromeMemoryDump(request->GetRequestArgs(),
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#include "base/strings/string_piece.h" #include "base/strings/string_piece.h"
#include "base/synchronization/lock.h" #include "base/synchronization/lock.h"
#include "base/trace_event/memory_dump_request_args.h" #include "base/trace_event/memory_dump_request_args.h"
#include "build/build_config.h"
#include "mojo/public/cpp/bindings/interface_request.h" #include "mojo/public/cpp/bindings/interface_request.h"
#include "services/resource_coordinator/public/cpp/memory_instrumentation/coordinator.h" #include "services/resource_coordinator/public/cpp/memory_instrumentation/coordinator.h"
#include "services/resource_coordinator/public/cpp/memory_instrumentation/memory_instrumentation.h" #include "services/resource_coordinator/public/cpp/memory_instrumentation/memory_instrumentation.h"
...@@ -75,6 +76,7 @@ void ClientProcessImpl::RequestChromeMemoryDump( ...@@ -75,6 +76,7 @@ void ClientProcessImpl::RequestChromeMemoryDump(
const base::trace_event::MemoryDumpRequestArgs& args, const base::trace_event::MemoryDumpRequestArgs& args,
const RequestChromeMemoryDumpCallback& callback) { const RequestChromeMemoryDumpCallback& callback) {
DCHECK(!callback.is_null()); DCHECK(!callback.is_null());
most_recent_chrome_memory_dump_guid_ = args.dump_guid;
auto it_and_inserted = auto it_and_inserted =
pending_chrome_callbacks_.emplace(args.dump_guid, std::move(callback)); pending_chrome_callbacks_.emplace(args.dump_guid, std::move(callback));
DCHECK(it_and_inserted.second) << "Duplicated request id " << args.dump_guid; DCHECK(it_and_inserted.second) << "Duplicated request id " << args.dump_guid;
...@@ -95,6 +97,14 @@ void ClientProcessImpl::OnChromeMemoryDumpDone( ...@@ -95,6 +97,14 @@ void ClientProcessImpl::OnChromeMemoryDumpDone(
auto callback = std::move(callback_it->second); auto callback = std::move(callback_it->second);
pending_chrome_callbacks_.erase(callback_it); pending_chrome_callbacks_.erase(callback_it);
auto it = delayed_os_memory_dump_callbacks_.find(dump_guid);
if (it != delayed_os_memory_dump_callbacks_.end()) {
for (auto& args : it->second) {
PerformOSMemoryDump(std::move(args));
}
delayed_os_memory_dump_callbacks_.erase(it);
}
if (!process_memory_dump) { if (!process_memory_dump) {
callback.Run(false, dump_guid, nullptr); callback.Run(false, dump_guid, nullptr);
return; return;
...@@ -129,19 +139,40 @@ void ClientProcessImpl::RequestOSMemoryDump( ...@@ -129,19 +139,40 @@ void ClientProcessImpl::RequestOSMemoryDump(
bool want_mmaps, bool want_mmaps,
const std::vector<base::ProcessId>& pids, const std::vector<base::ProcessId>& pids,
const RequestOSMemoryDumpCallback& callback) { const RequestOSMemoryDumpCallback& callback) {
OSMemoryDumpArgs args;
args.want_mmaps = want_mmaps;
args.pids = pids;
args.callback = callback;
#if defined(OS_MACOSX)
// If the most recent chrome memory dump hasn't finished, wait for that to
// finish.
if (most_recent_chrome_memory_dump_guid_.has_value()) {
uint64_t guid = most_recent_chrome_memory_dump_guid_.value();
auto it = pending_chrome_callbacks_.find(guid);
if (it != pending_chrome_callbacks_.end()) {
delayed_os_memory_dump_callbacks_[guid].push_back(std::move(args));
return;
}
}
#endif
PerformOSMemoryDump(std::move(args));
}
void ClientProcessImpl::PerformOSMemoryDump(OSMemoryDumpArgs args) {
bool global_success = true; bool global_success = true;
std::unordered_map<base::ProcessId, mojom::RawOSMemDumpPtr> results; std::unordered_map<base::ProcessId, mojom::RawOSMemDumpPtr> results;
for (const base::ProcessId& pid : pids) { for (const base::ProcessId& pid : args.pids) {
mojom::RawOSMemDumpPtr result = mojom::RawOSMemDump::New(); mojom::RawOSMemDumpPtr result = mojom::RawOSMemDump::New();
result->platform_private_footprint = mojom::PlatformPrivateFootprint::New(); result->platform_private_footprint = mojom::PlatformPrivateFootprint::New();
bool success = OSMetrics::FillOSMemoryDump(pid, result.get()); bool success = OSMetrics::FillOSMemoryDump(pid, result.get());
if (want_mmaps) if (args.want_mmaps)
success = success && OSMetrics::FillProcessMemoryMaps(pid, result.get()); success = success && OSMetrics::FillProcessMemoryMaps(pid, result.get());
if (success) if (success)
results[pid] = std::move(result); results[pid] = std::move(result);
global_success = global_success && success; global_success = global_success && success;
} }
callback.Run(global_success, std::move(results)); args.callback.Run(global_success, std::move(results));
} }
ClientProcessImpl::Config::Config(service_manager::Connector* connector, ClientProcessImpl::Config::Config(service_manager::Connector* connector,
...@@ -153,4 +184,9 @@ ClientProcessImpl::Config::Config(service_manager::Connector* connector, ...@@ -153,4 +184,9 @@ ClientProcessImpl::Config::Config(service_manager::Connector* connector,
ClientProcessImpl::Config::~Config() {} ClientProcessImpl::Config::~Config() {}
ClientProcessImpl::OSMemoryDumpArgs::OSMemoryDumpArgs() = default;
ClientProcessImpl::OSMemoryDumpArgs::OSMemoryDumpArgs(const OSMemoryDumpArgs&) =
default;
ClientProcessImpl::OSMemoryDumpArgs::~OSMemoryDumpArgs() = default;
} // namespace memory_instrumentation } // namespace memory_instrumentation
...@@ -83,10 +83,29 @@ class SERVICES_RESOURCE_COORDINATOR_PUBLIC_CPP_EXPORT ClientProcessImpl ...@@ -83,10 +83,29 @@ class SERVICES_RESOURCE_COORDINATOR_PUBLIC_CPP_EXPORT ClientProcessImpl
const std::vector<base::ProcessId>& ids, const std::vector<base::ProcessId>& ids,
const RequestOSMemoryDumpCallback& callback) override; const RequestOSMemoryDumpCallback& callback) override;
struct OSMemoryDumpArgs {
OSMemoryDumpArgs();
OSMemoryDumpArgs(const OSMemoryDumpArgs&);
~OSMemoryDumpArgs();
bool want_mmaps = false;
std::vector<base::ProcessId> pids;
RequestOSMemoryDumpCallback callback;
};
void PerformOSMemoryDump(OSMemoryDumpArgs args);
// Map containing pending chrome memory callbacks indexed by dump guid. // Map containing pending chrome memory callbacks indexed by dump guid.
// This must be destroyed after |binding_|. // This must be destroyed after |binding_|.
std::map<uint64_t, RequestChromeMemoryDumpCallback> pending_chrome_callbacks_; std::map<uint64_t, RequestChromeMemoryDumpCallback> pending_chrome_callbacks_;
// On macOS, we must wait for the most recent RequestChromeMemoryDumpCallback
// to complete before running the OS calculations. The key to this map is the
// dump_guid of that RequestChromeMemoryDumpCallback, the value a vector of
// callbacks to calculate and run. For more details, see
// https://bugs.chromium.org/p/chromium/issues/detail?id=812346#c16.
std::map<uint64_t, std::vector<OSMemoryDumpArgs>>
delayed_os_memory_dump_callbacks_;
base::Optional<uint64_t> most_recent_chrome_memory_dump_guid_;
mojom::CoordinatorPtr coordinator_; mojom::CoordinatorPtr coordinator_;
mojo::Binding<mojom::ClientProcess> binding_; mojo::Binding<mojom::ClientProcess> binding_;
const mojom::ProcessType process_type_; const mojom::ProcessType process_type_;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment