Restructure UnescapeURLWithAdjustmentsImpl().

In particular, unescape entire unicode characters at once, and then compare against unescape blacklists, rather than the other way around, to simplify code and avoid the tree structure of the old code. This will also allow the method to use icu's code point classification logic, at some point in the future. Also separate out comparing against the character blacklist and UTF-8 character decoding into separate methods, and add a few more test cases to unittest. The method itself should behave exactly the same as before. Bug: 824715 Change-Id: I5311f25bfda4132b122ec4a079740adf093099a3 Reviewed-on: https://chromium-review.googlesource.com/998014 Commit-Queue: Matt Menke <mmenke@chromium.org> Reviewed-by: Matt Giuca <mgiuca@chromium.org> Reviewed-by: Helen Li <xunjieli@chromium.org> Cr-Commit-Position: refs/heads/master@{#551029}

Restructure UnescapeURLWithAdjustmentsImpl().
In particular, unescape entire unicode characters at once, and then compare against unescape blacklists, rather than the other way around, to simplify code and avoid the tree structure of the old code. This will also allow the method to use icu's code point classification logic, at some point in the future. Also separate out comparing against the character blacklist and UTF-8 character decoding into separate methods, and add a few more test cases to unittest. The method itself should behave exactly the same as before. Bug: 824715 Change-Id: I5311f25bfda4132b122ec4a079740adf093099a3 Reviewed-on: https://chromium-review.googlesource.com/998014 Commit-Queue: Matt Menke <mmenke@chromium.org> Reviewed-by: Matt Giuca <mgiuca@chromium.org> Reviewed-by: Helen Li <xunjieli@chromium.org> Cr-Commit-Position: refs/heads/master@{#551029}
6e8dbd1d · Matt Menke · Commit Bot · dd942765 · 6e8dbd1d · 6e8dbd1d
Commit 6e8dbd1d authored Apr 16, 2018 by Matt Menke Committed by Commit Bot Apr 16, 2018
Expand all Show whitespace changes
Inline Side-by-side

Showing with 208 additions and 186 deletions

net/base/escape.cc net/base/escape.cc +151 -146

net/base/escape.h net/base/escape.h +4 -2

net/base/escape_unittest.cc net/base/escape_unittest.cc +53 -38

No files found.
--- a/net/base/escape.cc
+++ b/net/base/escape.cc
--- a/net/base/escape.h
+++ b/net/base/escape.h
@@ -81,7 +81,8 @@ class UnescapeRule {
    // Convert %20 to spaces. In some places where we're showing URLs, we may
    // want this. In places where the URL may be copied and pasted out, then
    // you wouldn't want this since it might not be interpreted in one piece
-    // by other applications.
+    // by other applications.  Other unicode spaces will not be unescaped unless
+    // SPOOFING_AND_CONTROL_CHARS is used.
    SPACES = 1 << 1,
    // Unescapes '/' and '\\'. If these characters were unescaped, the resulting
@@ -116,7 +117,8 @@ class UnescapeRule {
 // Unescapes |escaped_text| and returns the result.
 // Unescaping consists of looking for the exact pattern "%XX", where each X is
 // a hex digit, and converting to the character with the numerical value of
-// those digits. Thus "i%20=%203%3b" unescapes to "i = 3;".
+// those digits. Thus "i%20=%203%3b" unescapes to "i = 3;", if the
+// "UnescapeRule::SPACES" used.
 //
 // This method does not ensure that the output is a valid string using any
 // character encoding. However, unless SPOOFING_AND_CONTROL_CHARS is set, it

--- a/net/base/escape_unittest.cc
+++ b/net/base/escape_unittest.cc
@@ -236,6 +236,20 @@ TEST(EscapeTest, UnescapeURLComponent) {
       UnescapeRule::NORMAL | UnescapeRule::SPOOFING_AND_CONTROL_CHARS,
       "Some%20random text %25\xF0\x9F\x94\x93OK"},
+      // Two spoofing characters in a row should not be unescaped.
+      {"%D8%9C%D8%9C", UnescapeRule::NORMAL, "%D8%9C%D8%9C"},
+      // Non-spoofing characters surrounded by spoofing characters should be
+      // unescaped.
+      {"%D8%9C%C2%A1%D8%9C%C2%A1", UnescapeRule::NORMAL,
+       "%D8%9C\xC2\xA1%D8%9C\xC2\xA1"},
+      // Invalid UTF-8 characters surrounded by spoofing characters should be
+      // unescaped.
+      {"%D8%9C%85%D8%9C%85", UnescapeRule::NORMAL, "%D8%9C\x85%D8%9C\x85"},
+      // Test with enough trail bytes to overflow the CBU8_MAX_LENGTH-byte
+      // buffer. The first two bytes are a spoofing character as well.
+      {"%D8%9C%9C%9C%9C%9C%9C%9C%9C%9C%9C", UnescapeRule::NORMAL,
+       "%D8%9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C"},
      {"Some%20random text %25%2dOK", UnescapeRule::SPACES,
       "Some random text %25-OK"},
      {"Some%20random text %25%2dOK", UnescapeRule::PATH_SEPARATORS,
@@ -381,6 +395,7 @@ TEST(EscapeTest, AdjustOffset) {
      {"%2dtest", 1, std::string::npos},
      {"%2dtest", 0, 0},
      {"test%2d", 2, 2},
+      {"test%2e", 2, 2},
      {"%E4%BD%A0+%E5%A5%BD", 9, 1},
      {"%E4%BD%A0+%E5%A5%BD", 6, std::string::npos},
      {"%E4%BD%A0+%E5%A5%BD", 0, 0},