Commit 6e8dbd1d authored by Matt Menke's avatar Matt Menke Committed by Commit Bot

Restructure UnescapeURLWithAdjustmentsImpl().

In particular, unescape entire unicode characters at once, and then
compare against unescape blacklists, rather than the other way around,
to simplify code and avoid the tree structure of the old code. This
will also allow the method to use icu's code point classification
logic, at some point in the future.

Also separate out comparing against the character blacklist and UTF-8
character decoding into separate methods, and add a few more test cases
to unittest.

The method itself should behave exactly the same as before.

Bug: 824715
Change-Id: I5311f25bfda4132b122ec4a079740adf093099a3
Reviewed-on: https://chromium-review.googlesource.com/998014
Commit-Queue: Matt Menke <mmenke@chromium.org>
Reviewed-by: default avatarMatt Giuca <mgiuca@chromium.org>
Reviewed-by: default avatarHelen Li <xunjieli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#551029}
parent dd942765
This diff is collapsed.
...@@ -81,7 +81,8 @@ class UnescapeRule { ...@@ -81,7 +81,8 @@ class UnescapeRule {
// Convert %20 to spaces. In some places where we're showing URLs, we may // Convert %20 to spaces. In some places where we're showing URLs, we may
// want this. In places where the URL may be copied and pasted out, then // want this. In places where the URL may be copied and pasted out, then
// you wouldn't want this since it might not be interpreted in one piece // you wouldn't want this since it might not be interpreted in one piece
// by other applications. // by other applications. Other unicode spaces will not be unescaped unless
// SPOOFING_AND_CONTROL_CHARS is used.
SPACES = 1 << 1, SPACES = 1 << 1,
// Unescapes '/' and '\\'. If these characters were unescaped, the resulting // Unescapes '/' and '\\'. If these characters were unescaped, the resulting
...@@ -116,7 +117,8 @@ class UnescapeRule { ...@@ -116,7 +117,8 @@ class UnescapeRule {
// Unescapes |escaped_text| and returns the result. // Unescapes |escaped_text| and returns the result.
// Unescaping consists of looking for the exact pattern "%XX", where each X is // Unescaping consists of looking for the exact pattern "%XX", where each X is
// a hex digit, and converting to the character with the numerical value of // a hex digit, and converting to the character with the numerical value of
// those digits. Thus "i%20=%203%3b" unescapes to "i = 3;". // those digits. Thus "i%20=%203%3b" unescapes to "i = 3;", if the
// "UnescapeRule::SPACES" used.
// //
// This method does not ensure that the output is a valid string using any // This method does not ensure that the output is a valid string using any
// character encoding. However, unless SPOOFING_AND_CONTROL_CHARS is set, it // character encoding. However, unless SPOOFING_AND_CONTROL_CHARS is set, it
......
...@@ -236,6 +236,20 @@ TEST(EscapeTest, UnescapeURLComponent) { ...@@ -236,6 +236,20 @@ TEST(EscapeTest, UnescapeURLComponent) {
UnescapeRule::NORMAL | UnescapeRule::SPOOFING_AND_CONTROL_CHARS, UnescapeRule::NORMAL | UnescapeRule::SPOOFING_AND_CONTROL_CHARS,
"Some%20random text %25\xF0\x9F\x94\x93OK"}, "Some%20random text %25\xF0\x9F\x94\x93OK"},
// Two spoofing characters in a row should not be unescaped.
{"%D8%9C%D8%9C", UnescapeRule::NORMAL, "%D8%9C%D8%9C"},
// Non-spoofing characters surrounded by spoofing characters should be
// unescaped.
{"%D8%9C%C2%A1%D8%9C%C2%A1", UnescapeRule::NORMAL,
"%D8%9C\xC2\xA1%D8%9C\xC2\xA1"},
// Invalid UTF-8 characters surrounded by spoofing characters should be
// unescaped.
{"%D8%9C%85%D8%9C%85", UnescapeRule::NORMAL, "%D8%9C\x85%D8%9C\x85"},
// Test with enough trail bytes to overflow the CBU8_MAX_LENGTH-byte
// buffer. The first two bytes are a spoofing character as well.
{"%D8%9C%9C%9C%9C%9C%9C%9C%9C%9C%9C", UnescapeRule::NORMAL,
"%D8%9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C\x9C"},
{"Some%20random text %25%2dOK", UnescapeRule::SPACES, {"Some%20random text %25%2dOK", UnescapeRule::SPACES,
"Some random text %25-OK"}, "Some random text %25-OK"},
{"Some%20random text %25%2dOK", UnescapeRule::PATH_SEPARATORS, {"Some%20random text %25%2dOK", UnescapeRule::PATH_SEPARATORS,
...@@ -381,6 +395,7 @@ TEST(EscapeTest, AdjustOffset) { ...@@ -381,6 +395,7 @@ TEST(EscapeTest, AdjustOffset) {
{"%2dtest", 1, std::string::npos}, {"%2dtest", 1, std::string::npos},
{"%2dtest", 0, 0}, {"%2dtest", 0, 0},
{"test%2d", 2, 2}, {"test%2d", 2, 2},
{"test%2e", 2, 2},
{"%E4%BD%A0+%E5%A5%BD", 9, 1}, {"%E4%BD%A0+%E5%A5%BD", 9, 1},
{"%E4%BD%A0+%E5%A5%BD", 6, std::string::npos}, {"%E4%BD%A0+%E5%A5%BD", 6, std::string::npos},
{"%E4%BD%A0+%E5%A5%BD", 0, 0}, {"%E4%BD%A0+%E5%A5%BD", 0, 0},
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment