Correct macOS private memory footprint calculations.

On macOS, measurements for private memory footprint overcount by faulted pages in anonymous shared memory. To discount for this, this CL touches all the resident pages in anonymous shared memory, thus making them faulted as well. This relies on two assumptions: 1) Consumers use shared memory from front to back. Thus, if there are (N) resident pages, those pages represent the first N * PAGE_SIZE bytes in the shared memory region. 2) The faulting logic is run shortly before the logic that calculates phys_footprint, thus ensuring that the discrepancy between faulted and resident pages is minimal. The performance penalty is expected to be small. * Most of the time, we expect the pages to already be resident and faulted, thus incurring a cache penalty read hit [since we read from each resident page]. * Rarely, we expect the pages to be resident but not faulted, resulting in soft faults + cache penalty. * If assumption (1) is invalid, this will potentially fault some previously non-resident pages, thus increasing memory usage, without fixing the accounting. Bug: 812346 Change-Id: I04e91bc09bf6bdf2f9179dd8d678c92425e98fed Reviewed-on: https://chromium-review.googlesource.com/973883Reviewed-by: Primiano Tucci <primiano@chromium.org> Reviewed-by: Albert J. Wong <ajwong@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#545086}

Correct macOS private memory footprint calculations.
On macOS, measurements for private memory footprint overcount by faulted pages in anonymous shared memory. To discount for this, this CL touches all the resident pages in anonymous shared memory, thus making them faulted as well. This relies on two assumptions: 1) Consumers use shared memory from front to back. Thus, if there are (N) resident pages, those pages represent the first N * PAGE_SIZE bytes in the shared memory region. 2) The faulting logic is run shortly before the logic that calculates phys_footprint, thus ensuring that the discrepancy between faulted and resident pages is minimal. The performance penalty is expected to be small. * Most of the time, we expect the pages to already be resident and faulted, thus incurring a cache penalty read hit [since we read from each resident page]. * Rarely, we expect the pages to be resident but not faulted, resulting in soft faults + cache penalty. * If assumption (1) is invalid, this will potentially fault some previously non-resident pages, thus increasing memory usage, without fixing the accounting. Bug: 812346 Change-Id: I04e91bc09bf6bdf2f9179dd8d678c92425e98fed Reviewed-on: https://chromium-review.googlesource.com/973883Reviewed-by: Primiano Tucci <primiano@chromium.org> Reviewed-by: Albert J. Wong <ajwong@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#545086}
7c1d03f5 · erikchen · Commit Bot · 92aed294 · 7c1d03f5 · 7c1d03f5
Commit 7c1d03f5 authored Mar 22, 2018 by erikchen Committed by Commit Bot Mar 22, 2018
4 changed files
--- a/base/trace_event/process_memory_dump.cc
+++ b/base/trace_event/process_memory_dump.cc
@@ -84,7 +84,7 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address,
  DCHECK_EQ(0u, start_pointer % page_size);
  size_t offset = 0;
-  size_t total_resident_size = 0;
+  size_t total_resident_pages = 0;
  bool failure = false;
  // An array as large as number of pages in memory segment needs to be passed
@@ -150,16 +150,16 @@ size_t ProcessMemoryDump::CountResidentBytes(void* start_address,
    if (failure)
      break;
-    total_resident_size += resident_page_count * page_size;
+    total_resident_pages += resident_page_count * page_size;
    offset += kMaxChunkSize;
  }
  DCHECK(!failure);
  if (failure) {
-    total_resident_size = 0;
+    total_resident_pages = 0;
    LOG(ERROR) << "CountResidentBytes failed. The resident size is invalid";
  }
-  return total_resident_size;
+  return total_resident_pages;
 }
 // static
@@ -180,9 +180,49 @@ base::Optional<size_t> ProcessMemoryDump::CountResidentBytesInSharedMemory(
    return base::Optional<size_t>();
  }
-  size_t resident_size =
+  size_t resident_pages =
      info.private_pages_resident + info.shared_pages_resident;
-  return resident_size * PAGE_SIZE;
+  // On macOS, measurements for private memory footprint overcount by
+  // faulted pages in anonymous shared memory. To discount for this, we touch
+  // all the resident pages in anonymous shared memory here, thus making them
+  // faulted as well. This relies on two assumptions:
+  //
+  // 1) Consumers use shared memory from front to back. Thus, if there are
+  // (N) resident pages, those pages represent the first N * PAGE_SIZE bytes in
+  // the shared memory region.
+  //
+  // 2) This logic is run shortly before the logic that calculates
+  // phys_footprint, thus ensuring that the discrepancy between faulted and
+  // resident pages is minimal.
+  //
+  // The performance penalty is expected to be small.
+  //
+  // * Most of the time, we expect the pages to already be resident and faulted,
+  // thus incurring a cache penalty read hit [since we read from each resident
+  // page].
+  //
+  // * Rarely, we expect the pages to be resident but not faulted, resulting in
+  // soft faults + cache penalty.
+  //
+  // * If assumption (1) is invalid, this will potentially fault some
+  // previously non-resident pages, thus increasing memory usage, without fixing
+  // the accounting.
+  //
+  // Sanity check in case the mapped size is less than the total size of the
+  // region.
+  size_t pages_to_fault =
+      std::min(resident_pages,
+               (shared_memory.mapped_size() + PAGE_SIZE - 1) / PAGE_SIZE);
+  volatile char* base_address = static_cast<char*>(shared_memory.memory());
+  for (size_t i = 0; i < pages_to_fault; ++i) {
+    // Reading from a volatile is a visible side-effect for the purposes of
+    // optimization. This guarantees that the optimizer will not kill this line.
+    base_address[i * PAGE_SIZE];
+  }
+  return resident_pages * PAGE_SIZE;
 #else
  return CountResidentBytes(shared_memory.memory(),
                            shared_memory.mapped_size());

--- a/services/resource_coordinator/memory_instrumentation/queued_request_dispatcher.cc
+++ b/services/resource_coordinator/memory_instrumentation/queued_request_dispatcher.cc
@@ -254,6 +254,12 @@ void QueuedRequestDispatcher::SetUpAndDispatch(
    // Don't request a chrome memory dump at all if the client wants only the
    // processes' vm regions, which are retrieved via RequestOSMemoryDump().
+    //
+    // This must occur before the call to RequestOSMemoryDump, as
+    // ClientProcessImpl will [for macOS], delay the calculations for the
+    // OSMemoryDump until the Chrome memory dump is finished. See
+    // https://bugs.chromium.org/p/chromium/issues/detail?id=812346#c16 for more
+    // details.
    if (request->wants_chrome_dumps()) {
      request->pending_responses.insert({client, ResponseType::kChromeDump});
      client->RequestChromeMemoryDump(request->GetRequestArgs(),

--- a/services/resource_coordinator/public/cpp/memory_instrumentation/client_process_impl.cc
+++ b/services/resource_coordinator/public/cpp/memory_instrumentation/client_process_impl.cc
@@ -10,6 +10,7 @@
 #include "base/strings/string_piece.h"
 #include "base/synchronization/lock.h"
 #include "base/trace_event/memory_dump_request_args.h"
+#include "build/build_config.h"
 #include "mojo/public/cpp/bindings/interface_request.h"
 #include "services/resource_coordinator/public/cpp/memory_instrumentation/coordinator.h"
 #include "services/resource_coordinator/public/cpp/memory_instrumentation/memory_instrumentation.h"
@@ -75,6 +76,7 @@ void ClientProcessImpl::RequestChromeMemoryDump(
    const base::trace_event::MemoryDumpRequestArgs& args,
    const RequestChromeMemoryDumpCallback& callback) {
  DCHECK(!callback.is_null());
+  most_recent_chrome_memory_dump_guid_ = args.dump_guid;
  auto it_and_inserted =
      pending_chrome_callbacks_.emplace(args.dump_guid, std::move(callback));
  DCHECK(it_and_inserted.second) << "Duplicated request id " << args.dump_guid;
@@ -95,6 +97,14 @@ void ClientProcessImpl::OnChromeMemoryDumpDone(
  auto callback = std::move(callback_it->second);
  pending_chrome_callbacks_.erase(callback_it);
+  auto it = delayed_os_memory_dump_callbacks_.find(dump_guid);
+  if (it != delayed_os_memory_dump_callbacks_.end()) {
+    for (auto& args : it->second) {
+      PerformOSMemoryDump(std::move(args));
+    }
+    delayed_os_memory_dump_callbacks_.erase(it);
+  }
  if (!process_memory_dump) {
    callback.Run(false, dump_guid, nullptr);
    return;
@@ -129,19 +139,40 @@ void ClientProcessImpl::RequestOSMemoryDump(
    bool want_mmaps,
    const std::vector<base::ProcessId>& pids,
    const RequestOSMemoryDumpCallback& callback) {
+  OSMemoryDumpArgs args;
+  args.want_mmaps = want_mmaps;
+  args.pids = pids;
+  args.callback = callback;
+#if defined(OS_MACOSX)
+  // If the most recent chrome memory dump hasn't finished, wait for that to
+  // finish.
+  if (most_recent_chrome_memory_dump_guid_.has_value()) {
+    uint64_t guid = most_recent_chrome_memory_dump_guid_.value();
+    auto it = pending_chrome_callbacks_.find(guid);
+    if (it != pending_chrome_callbacks_.end()) {
+      delayed_os_memory_dump_callbacks_[guid].push_back(std::move(args));
+      return;
+    }
+  }
+#endif
+  PerformOSMemoryDump(std::move(args));
+}
+void ClientProcessImpl::PerformOSMemoryDump(OSMemoryDumpArgs args) {
  bool global_success = true;
  std::unordered_map<base::ProcessId, mojom::RawOSMemDumpPtr> results;
-  for (const base::ProcessId& pid : pids) {
+  for (const base::ProcessId& pid : args.pids) {
    mojom::RawOSMemDumpPtr result = mojom::RawOSMemDump::New();
    result->platform_private_footprint = mojom::PlatformPrivateFootprint::New();
    bool success = OSMetrics::FillOSMemoryDump(pid, result.get());
-    if (want_mmaps)
+    if (args.want_mmaps)
      success = success && OSMetrics::FillProcessMemoryMaps(pid, result.get());
    if (success)
      results[pid] = std::move(result);
    global_success = global_success && success;
  }
-  callback.Run(global_success, std::move(results));
+  args.callback.Run(global_success, std::move(results));
 }
 ClientProcessImpl::Config::Config(service_manager::Connector* connector,
@@ -153,4 +184,9 @@ ClientProcessImpl::Config::Config(service_manager::Connector* connector,
 ClientProcessImpl::Config::~Config() {}
+ClientProcessImpl::OSMemoryDumpArgs::OSMemoryDumpArgs() = default;
+ClientProcessImpl::OSMemoryDumpArgs::OSMemoryDumpArgs(const OSMemoryDumpArgs&) =
+    default;
+ClientProcessImpl::OSMemoryDumpArgs::~OSMemoryDumpArgs() = default;
 }  // namespace memory_instrumentation
--- a/services/resource_coordinator/public/cpp/memory_instrumentation/client_process_impl.h
+++ b/services/resource_coordinator/public/cpp/memory_instrumentation/client_process_impl.h
@@ -83,10 +83,29 @@ class SERVICES_RESOURCE_COORDINATOR_PUBLIC_CPP_EXPORT ClientProcessImpl
      const std::vector<base::ProcessId>& ids,
      const RequestOSMemoryDumpCallback& callback) override;
+  struct OSMemoryDumpArgs {
+    OSMemoryDumpArgs();
+    OSMemoryDumpArgs(const OSMemoryDumpArgs&);
+    ~OSMemoryDumpArgs();
+    bool want_mmaps = false;
+    std::vector<base::ProcessId> pids;
+    RequestOSMemoryDumpCallback callback;
+  };
+  void PerformOSMemoryDump(OSMemoryDumpArgs args);
  // Map containing pending chrome memory callbacks indexed by dump guid.
  // This must be destroyed after |binding_|.
  std::map<uint64_t, RequestChromeMemoryDumpCallback> pending_chrome_callbacks_;
+  // On macOS, we must wait for the most recent RequestChromeMemoryDumpCallback
+  // to complete before running the OS calculations. The key to this map is the
+  // dump_guid of that RequestChromeMemoryDumpCallback, the value a vector of
+  // callbacks to calculate and run. For more details, see
+  // https://bugs.chromium.org/p/chromium/issues/detail?id=812346#c16.
+  std::map<uint64_t, std::vector<OSMemoryDumpArgs>>
+      delayed_os_memory_dump_callbacks_;
+  base::Optional<uint64_t> most_recent_chrome_memory_dump_guid_;
  mojom::CoordinatorPtr coordinator_;
  mojo::Binding<mojom::ClientProcess> binding_;
  const mojom::ProcessType process_type_;