Fixed downloaded filenames containing percent-encoded characters.

Servers set the download filename in a URL syntax (either through the Content-Disposition header, or in the filename of the requested URL itself). Previously, characters would remain percent-encoded when saved to disk, if they are deemed unsuitable for display. This criteria makes no sense for setting a filename. Cases included: - Control characters. - Path separators. - Special URL characters such as '%', '+', '&' and '#'. - Spoofing characters, such as invisible characters. Now, all characters are unescaped. Characters that are illegal in filenames (such as control characters, path separators, colons, etc) are converted into underscores. Other characters are left bare. Fixed and added tests. Bug: 849794 Change-Id: I797ee3d6aa8b803d9a7227e821fd8b4d55d0c58d Reviewed-on: https://chromium-review.googlesource.com/1209003Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Matt Giuca <mgiuca@chromium.org> Cr-Commit-Position: refs/heads/master@{#589386}

Fixed downloaded filenames containing percent-encoded characters.
Servers set the download filename in a URL syntax (either through the Content-Disposition header, or in the filename of the requested URL itself). Previously, characters would remain percent-encoded when saved to disk, if they are deemed unsuitable for display. This criteria makes no sense for setting a filename. Cases included: - Control characters. - Path separators. - Special URL characters such as '%', '+', '&' and '#'. - Spoofing characters, such as invisible characters. Now, all characters are unescaped. Characters that are illegal in filenames (such as control characters, path separators, colons, etc) are converted into underscores. Other characters are left bare. Fixed and added tests. Bug: 849794 Change-Id: I797ee3d6aa8b803d9a7227e821fd8b4d55d0c58d Reviewed-on: https://chromium-review.googlesource.com/1209003Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Matt Giuca <mgiuca@chromium.org> Cr-Commit-Position: refs/heads/master@{#589386}
25c6a34f · Matt Giuca · Commit Bot · 64a9d00b · 25c6a34f · 25c6a34f
Commit 25c6a34f authored Sep 07, 2018 by Matt Giuca Committed by Commit Bot Sep 07, 2018
3 changed files
--- a/net/base/filename_util_internal.cc
+++ b/net/base/filename_util_internal.cc
@@ -125,10 +125,9 @@ std::string GetFileNameFromURL(const GURL& url,
  if (!url.is_valid() || url.SchemeIs("about") || url.SchemeIs("data"))
    return std::string();

-  const std::string unescaped_url_filename = UnescapeURLComponent(
-      url.ExtractFileName(),
-      UnescapeRule::SPACES |
-          UnescapeRule::URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS);
+  std::string unescaped_url_filename;
+  UnescapeBinaryURLComponent(url.ExtractFileName(), UnescapeRule::NORMAL,
+                             &unescaped_url_filename);

  // The URL's path should be escaped UTF-8, but may not be.
  std::string decoded_filename = unescaped_url_filename;

--- a/net/base/filename_util_unittest.cc
+++ b/net/base/filename_util_unittest.cc
@@ -573,9 +573,14 @@ TEST(FilenameUtilTest, GenerateFileName) {
    {// A normal avi should get .avi and not .avi.avi
     __LINE__, "https://example.com/misc/2.avi", "", "", "", "video/x-msvideo",
     L"download", L"2.avi"},
-    {// Shouldn't unescape slashes.
+    {// Slashes are illegal, and should be replaced with underscores.
     __LINE__, "http://example.com/foo%2f..%2fbar.jpg", "", "", "",
-     "text/plain", L"download", L"foo%2f..%2fbar.jpg"},
+     "text/plain", L"download", L"foo_.._bar.jpg"},
+    {// "%00" decodes to the NUL byte, which is illegal and should be replaced
+     // with an underscore. (Note: This can't be tested with a URL, since "%00"
+     // is illegal in a URL. Only applies to Content-Disposition.)
+     __LINE__, "http://example.com/download.py", "filename=foo%00bar.jpg", "",
+     "", "text/plain", L"download", L"foo_bar.jpg"},
    {// Extension generation for C-D derived filenames.
     __LINE__, "", "filename=my-cat", "", "", "image/jpeg", L"download",
     L"my-cat"},
@@ -744,6 +749,15 @@ TEST(FilenameUtilTest, GenerateFileName) {
     __LINE__, "http://www.example.com/fooa%cc%88.txt", "", "", "",
     "image/jpeg", L"foo\xe4", L"foo\xe4.txt"},
 #endif
+
+    // U+3000 IDEOGRAPHIC SPACE (http://crbug.com/849794): In URL file name.
+    {__LINE__, "http://www.example.com/%E5%B2%A1%E3%80%80%E5%B2%A1.txt", "", "",
+     "", "text/plain", L"", L"\u5ca1\u3000\u5ca1.txt"},
+    // U+3000 IDEOGRAPHIC SPACE (http://crbug.com/849794): In
+    // Content-Disposition filename.
+    {__LINE__, "http://www.example.com/download.py",
+     "filename=%E5%B2%A1%E3%80%80%E5%B2%A1.txt", "utf-8", "", "text/plain", L"",
+     L"\u5ca1\u3000\u5ca1.txt"},
  };

  for (const auto& selection_test : selection_tests)

--- a/net/http/http_content_disposition.cc
+++ b/net/http/http_content_disposition.cc
@@ -189,7 +189,7 @@ bool DecodeWord(const std::string& encoded_word,
  // web browser.

  // What IE6/7 does: %-escaped UTF-8.
-  decoded_word = UnescapeURLComponent(encoded_word, UnescapeRule::SPACES);
+  UnescapeBinaryURLComponent(encoded_word, UnescapeRule::NORMAL, &decoded_word);
  if (decoded_word != encoded_word)
    *parse_result_flags |= HttpContentDisposition::HAS_PERCENT_ENCODED_STRINGS;
  if (base::IsStringUTF8(decoded_word)) {