[base] Always kill test batch process groups under POSIX platforms.

This is a re-land with sanity-checks of the kill() outcome removed, so we can address the effects of process leaks separately from introducing leak-checking. Under Windows and Fuchsia the TestLauncher runs test batch processes in their own jobs, allowing any leaked sub-processes leaked by tests to be cleaned up. Under POSIX platforms each batch is run in its own process group, but the group was only being kill()ed if the test batch failed or crashed. TestLauncher now proactively kill()s each test batch's process group. Note that this will still miss processes launched into new process groups by tests. Bug: 1094369 Change-Id: I0880f08bd120fc5dd057a54efc3db63173feeca9 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2359136 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Takuto Ikuta <tikuta@chromium.org> Cr-Commit-Position: refs/heads/master@{#798636}

[base] Always kill test batch process groups under POSIX platforms.
This is a re-land with sanity-checks of the kill() outcome removed, so we can address the effects of process leaks separately from introducing leak-checking. Under Windows and Fuchsia the TestLauncher runs test batch processes in their own jobs, allowing any leaked sub-processes leaked by tests to be cleaned up. Under POSIX platforms each batch is run in its own process group, but the group was only being kill()ed if the test batch failed or crashed. TestLauncher now proactively kill()s each test batch's process group. Note that this will still miss processes launched into new process groups by tests. Bug: 1094369 Change-Id: I0880f08bd120fc5dd057a54efc3db63173feeca9 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2359136 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Takuto Ikuta <tikuta@chromium.org> Cr-Commit-Position: refs/heads/master@{#798636}
84ff2283 · Wez · Commit Bot · bd31d8e5 · 84ff2283 · 84ff2283
Commit 84ff2283 authored Aug 17, 2020 by Wez Committed by Commit Bot Aug 17, 2020
5 changed files
--- a/base/process/kill.h
+++ b/base/process/kill.h
@@ -84,12 +84,6 @@ BASE_EXPORT bool KillProcesses(const FilePath::StringType& executable_name,
                               int exit_code,
                               const ProcessFilter* filter);

-#if defined(OS_POSIX)
-// Attempts to kill the process group identified by |process_group_id|. Returns
-// true on success.
-BASE_EXPORT bool KillProcessGroup(ProcessHandle process_group_id);
-#endif  // defined(OS_POSIX)
-
 // Get the termination status of the process by interpreting the
 // circumstances of the child process' death. |exit_code| is set to
 // the status returned by waitpid() on POSIX, and from GetExitCodeProcess() on

--- a/base/process/kill_fuchsia.cc
+++ b/base/process/kill_fuchsia.cc
@@ -13,14 +13,6 @@

 namespace base {

-bool KillProcessGroup(ProcessHandle process_group_id) {
-  // |process_group_id| is really a job on Fuchsia.
-  zx_status_t status = zx_task_kill(process_group_id);
-  DLOG_IF(ERROR, status != ZX_OK)
-      << "unable to terminate job " << process_group_id;
-  return status == ZX_OK;
-}
-
 TerminationStatus GetTerminationStatus(ProcessHandle handle, int* exit_code) {
  DCHECK(exit_code);


--- a/base/process/kill_posix.cc
+++ b/base/process/kill_posix.cc
@@ -78,15 +78,6 @@ TerminationStatus GetTerminationStatusImpl(ProcessHandle handle,

 }  // namespace

-#if !defined(OS_NACL_NONSFI)
-bool KillProcessGroup(ProcessHandle process_group_id) {
-  bool result = kill(-1 * process_group_id, SIGKILL) == 0;
-  if (!result)
-    DPLOG(ERROR) << "Unable to terminate process group " << process_group_id;
-  return result;
-}
-#endif  // !defined(OS_NACL_NONSFI)
-
 TerminationStatus GetTerminationStatus(ProcessHandle handle, int* exit_code) {
  return GetTerminationStatusImpl(handle, false /* can_block */, exit_code);
 }

--- a/base/test/launcher/test_launcher.cc
+++ b/base/test/launcher/test_launcher.cc
@@ -438,38 +438,17 @@ int LaunchChildTestProcessWithOptions(const CommandLine& command_line,
    // Cleanup the data directory.
    CHECK(DeletePathRecursively(nested_data_path));
 #elif defined(OS_POSIX)
-
-#if BUILDFLAG(CLANG_PROFILING)
-    // TODO(crbug.com/1094369): Remove this condition once the child process
-    // leaking bug is fixed.
-    //
-    // TODO(crbug.com/1095075): Make test launcher treat lingering child
-    // processes hard failures so that they can be detected and surfaced
-    // gracefully.
-    //
-    // When profiling is enabled, browser child processes take extra time to
-    // dump profiles, which means that lingering processes are much more likely
-    // to happen than non-profiling build. Therefore, on POSIX, in order to
-    // avoid polluting the machine state, ensure any child processes that the
-    // test might have created are cleaned up to avoid potential leaking even
-    // when tests have passed. On Windows, child processes are automatically
-    // cleaned up using JobObjects.
-    //
-    // On non-profiling build, when tests have passed, we don't clean up the
-    // lingering processes even when there are any, and the reason is that they
-    // usually indicate prod issues, letting them slip to the following test
-    // tasks and cause failures increses the chance of them being surfaced.
+    // It is not possible to waitpid() on any leaked sub-processes of the test
+    // batch process, since those are not direct children of this process.
+    // kill()ing the process-group will return a result indicating whether the
+    // group was found (i.e. processes were still running in it) or not (i.e.
+    // sub-processes had exited already). Unfortunately many tests (e.g. browser
+    // tests) have processes exit asynchronously, so checking the kill() result
+    // will report false failures.
+    // Unconditionally kill the process group, regardless of the batch exit-code
+    // until a better solution is available.
    kill(-1 * process.Handle(), SIGKILL);
-#else
-    if (exit_code != 0) {
-      // On POSIX, in case the test does not exit cleanly, either due to a crash
-      // or due to it timing out, we need to clean up any child processes that
-      // it might have created. On Windows, child processes are automatically
-      // cleaned up using JobObjects.
-      KillProcessGroup(process.Handle());
-    }
-#endif
-#endif
+#endif  // defined(OS_POSIX)

    GetLiveProcesses()->erase(process.Handle());
  }

--- a/base/test/launcher/test_launcher_unittest.cc
+++ b/base/test/launcher/test_launcher_unittest.cc
@@ -708,12 +708,12 @@ TEST(MockUnitTests, DISABLED_FailTest) {
 TEST(MockUnitTests, DISABLED_CrashTest) {
  IMMEDIATE_CRASH();
 }
-// Basic test will not be reached with default batch size.
+// Basic test will not be reached, due to the preceding crash in the same batch.
 TEST(MockUnitTests, DISABLED_NoRunTest) {
  ASSERT_TRUE(true);
 }

-// Using TestLauncher to launch 3 simple unitests
+// Using TestLauncher to launch 3 basic unitests
 // and validate the resulting json file.
 TEST_F(UnitTestLauncherDelegateTester, RunMockTests) {
  CommandLine command_line(CommandLine::ForCurrentProcess()->GetProgram());