Commit 25c6a34f authored by Matt Giuca's avatar Matt Giuca Committed by Commit Bot

Fixed downloaded filenames containing percent-encoded characters.

Servers set the download filename in a URL syntax (either through the
Content-Disposition header, or in the filename of the requested URL
itself). Previously, characters would remain percent-encoded when saved
to disk, if they are deemed unsuitable for display. This criteria makes
no sense for setting a filename.

Cases included:
- Control characters.
- Path separators.
- Special URL characters such as '%', '+', '&' and '#'.
- Spoofing characters, such as invisible characters.

Now, all characters are unescaped. Characters that are illegal in
filenames (such as control characters, path separators, colons, etc) are
converted into underscores. Other characters are left bare.

Fixed and added tests.

Bug: 849794
Change-Id: I797ee3d6aa8b803d9a7227e821fd8b4d55d0c58d
Reviewed-on: https://chromium-review.googlesource.com/1209003Reviewed-by: default avatarMatt Menke <mmenke@chromium.org>
Commit-Queue: Matt Giuca <mgiuca@chromium.org>
Cr-Commit-Position: refs/heads/master@{#589386}
parent 64a9d00b
......@@ -125,10 +125,9 @@ std::string GetFileNameFromURL(const GURL& url,
if (!url.is_valid() || url.SchemeIs("about") || url.SchemeIs("data"))
return std::string();
const std::string unescaped_url_filename = UnescapeURLComponent(
url.ExtractFileName(),
UnescapeRule::SPACES |
UnescapeRule::URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS);
std::string unescaped_url_filename;
UnescapeBinaryURLComponent(url.ExtractFileName(), UnescapeRule::NORMAL,
&unescaped_url_filename);
// The URL's path should be escaped UTF-8, but may not be.
std::string decoded_filename = unescaped_url_filename;
......
......@@ -573,9 +573,14 @@ TEST(FilenameUtilTest, GenerateFileName) {
{// A normal avi should get .avi and not .avi.avi
__LINE__, "https://example.com/misc/2.avi", "", "", "", "video/x-msvideo",
L"download", L"2.avi"},
{// Shouldn't unescape slashes.
{// Slashes are illegal, and should be replaced with underscores.
__LINE__, "http://example.com/foo%2f..%2fbar.jpg", "", "", "",
"text/plain", L"download", L"foo%2f..%2fbar.jpg"},
"text/plain", L"download", L"foo_.._bar.jpg"},
{// "%00" decodes to the NUL byte, which is illegal and should be replaced
// with an underscore. (Note: This can't be tested with a URL, since "%00"
// is illegal in a URL. Only applies to Content-Disposition.)
__LINE__, "http://example.com/download.py", "filename=foo%00bar.jpg", "",
"", "text/plain", L"download", L"foo_bar.jpg"},
{// Extension generation for C-D derived filenames.
__LINE__, "", "filename=my-cat", "", "", "image/jpeg", L"download",
L"my-cat"},
......@@ -744,6 +749,15 @@ TEST(FilenameUtilTest, GenerateFileName) {
__LINE__, "http://www.example.com/fooa%cc%88.txt", "", "", "",
"image/jpeg", L"foo\xe4", L"foo\xe4.txt"},
#endif
// U+3000 IDEOGRAPHIC SPACE (http://crbug.com/849794): In URL file name.
{__LINE__, "http://www.example.com/%E5%B2%A1%E3%80%80%E5%B2%A1.txt", "", "",
"", "text/plain", L"", L"\u5ca1\u3000\u5ca1.txt"},
// U+3000 IDEOGRAPHIC SPACE (http://crbug.com/849794): In
// Content-Disposition filename.
{__LINE__, "http://www.example.com/download.py",
"filename=%E5%B2%A1%E3%80%80%E5%B2%A1.txt", "utf-8", "", "text/plain", L"",
L"\u5ca1\u3000\u5ca1.txt"},
};
for (const auto& selection_test : selection_tests)
......
......@@ -189,7 +189,7 @@ bool DecodeWord(const std::string& encoded_word,
// web browser.
// What IE6/7 does: %-escaped UTF-8.
decoded_word = UnescapeURLComponent(encoded_word, UnescapeRule::SPACES);
UnescapeBinaryURLComponent(encoded_word, UnescapeRule::NORMAL, &decoded_word);
if (decoded_word != encoded_word)
*parse_result_flags |= HttpContentDisposition::HAS_PERCENT_ENCODED_STRINGS;
if (base::IsStringUTF8(decoded_word)) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment