Commit 3565fc3c authored by Bruce Dawson's avatar Bruce Dawson Committed by Chromium LUCI CQ

Retry ReplaceFile to protect against anti-virus

Anti-virus programs and other scanners may briefly lock new files which
can lead to frequent problems with saving bookmarks and other files
that use the ImportantFileWriter. This attempts to deal with this by
retrying the racy ReplaceFile step a few times.

This change also adds instrumentation to record how many retries are
needed, for future tuning. It also moves the SetLastError call as
close as possible to the function that will consume the last error code.

This is only done on Windows because that is hoped to be the only
place where it happens.

Bug: 1099284
Change-Id: I7aa9a27e137d3df2c402de5f13bbe911521a711d
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2559156Reviewed-by: default avatarSteven Holte <holte@chromium.org>
Reviewed-by: default avatarDaniel Cheng <dcheng@chromium.org>
Reviewed-by: default avatarGreg Thompson <grt@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>
Cr-Commit-Position: refs/heads/master@{#839668}
parent 783a57bc
......@@ -42,6 +42,15 @@ namespace base {
namespace {
constexpr auto kDefaultCommitInterval = TimeDelta::FromSeconds(10);
#if defined(OS_WIN)
// This is how many times we will retry ReplaceFile on Windows.
constexpr int kReplaceRetries = 5;
// This is the result code recorded if ReplaceFile still fails.
// It should stay constant even if we change kReplaceRetries.
constexpr int kReplaceRetryFailure = 10;
static_assert(kReplaceRetryFailure > kReplaceRetries, "No overlap allowed");
constexpr auto kReplacePauseInterval = TimeDelta::FromMilliseconds(100);
#endif
// These values are persisted to logs. Entries should not be renumbered and
// numeric values should never be reused.
......@@ -285,21 +294,41 @@ bool ImportantFileWriter::WriteFileAtomicallyImpl(const FilePath& path,
PlatformThread::SetCurrentThreadPriority(ThreadPriority::DISPLAY);
#endif // defined(OS_WIN)
tmp_file.Close();
const bool result = ReplaceFile(tmp_file_path, path, &replace_file_error);
bool result = ReplaceFile(tmp_file_path, path, &replace_file_error);
#if defined(OS_WIN)
// Save and restore the last error code so that it's not polluted by the
// thread priority change.
const auto last_error = ::GetLastError();
auto last_error = ::GetLastError();
int retry_count = 0;
for (/**/; !result && retry_count < kReplaceRetries; ++retry_count) {
// The race condition between closing the temporary file and moving it gets
// hit on a regular basis on some systems (https://crbug.com/1099284), so
// we retry a few times before giving up.
PlatformThread::Sleep(kReplacePauseInterval);
result = ReplaceFile(tmp_file_path, path, &replace_file_error);
last_error = ::GetLastError();
}
if (reset_priority)
PlatformThread::SetCurrentThreadPriority(previous_priority);
// Log how many times we had to retry the ReplaceFile operation before it
// succeeded. If we never succeeded then return a special value.
if (!result)
::SetLastError(last_error);
retry_count = kReplaceRetryFailure;
UmaHistogramExactLinear("ImportantFile.FileReplaceRetryCount", retry_count,
kReplaceRetryFailure);
#endif // defined(OS_WIN)
if (!result) {
UmaHistogramExactLinearWithSuffix("ImportantFile.FileRenameError",
histogram_suffix, -replace_file_error,
-File::FILE_ERROR_MAX);
#if defined(OS_WIN)
// Restore the error code from ReplaceFile so that it will be available for
// LogFailure, otherwise failures in SetCurrrentThreadPriority may be
// reported instead.
::SetLastError(last_error);
#endif
LogFailure(path, histogram_suffix, FAILED_RENAMING,
"could not rename temporary file");
DeleteTmpFileWithRetry(File(), tmp_file_path, histogram_suffix);
......
......@@ -6785,6 +6785,17 @@ reviews. Googlers can read more about this at go/gwsq-gerrit.
</summary>
</histogram>
<histogram name="ImportantFile.FileReplaceRetryCount" units="attempt count"
expires_after="2021-06-30">
<owner>brucedawson@chromium.org</owner>
<owner>grt@chromium.org</owner>
<summary>
The number of retries needed to successfully move the temporary file to its
final location. Zero means that ReplaceFile worked the first time. Ten means
that it never succeeded.
</summary>
</histogram>
<histogram name="ImportantFile.FileWriteError" enum="PlatformFileError"
expires_after="2021-05-23">
<owner>grt@chromium.org</owner>
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment