Commit 68b90f72 authored by Etienne Bergeron's avatar Etienne Bergeron Committed by Commit Bot

Replace generic control characters

This CL is replacing the control characters with the "Replacement Codepoints".

The codepoints in that area are:

  Unicode Character Category 'Other, Control'
  a) u+0000 -> u+001f
  b) u+007f   (delete)
  c) 0080 -> 009f

The codepoints from a) and b) are already handled properly.

For codepoints from c), it is quite common to have no glyphs (win7).

"""
  Most of these characters play no explicit role in Unicode text handling.
  The characters U+0000 <control-0000> (NUL), U+0009 <control-0009> tab key (HT),
  U+000A <control-000A> newline (LF), U+000D <control-000D> (CR), and
  U+0085 <control-0085> (NEL) are commonly used in text processing as formatting
  characters.
"""


see: https://en.wikipedia.org/wiki/Unicode_control_characters
Bug: 1011818
Change-Id: I23010faf9130806db3884720e4a90c0723802893
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1894041
Commit-Queue: Etienne Bergeron <etienneb@chromium.org>
Reviewed-by: default avatarAlexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#711465}
parent 5be8506d
...@@ -226,7 +226,8 @@ void ReplaceControlCharactersWithSymbols(bool multiline, base::string16* text) { ...@@ -226,7 +226,8 @@ void ReplaceControlCharactersWithSymbols(bool multiline, base::string16* text) {
// Private use codepoints are working with a pair of font and codepoint, // Private use codepoints are working with a pair of font and codepoint,
// but they are not used in Chrome. // but they are not used in Chrome.
const int8_t codepoint_category = u_charType(codepoint); const int8_t codepoint_category = u_charType(codepoint);
if (codepoint_category == U_PRIVATE_USE_CHAR) { if (codepoint_category == U_PRIVATE_USE_CHAR ||
codepoint_category == U_CONTROL_CHAR) {
(*text)[offset] = kReplacementCodepoint; (*text)[offset] = kReplacementCodepoint;
// We may need to replace the surrogate pair. // We may need to replace the surrogate pair.
if (next_offset != offset + 1) if (next_offset != offset + 1)
......
...@@ -4528,6 +4528,11 @@ TEST_F(RenderTextTest, ControlCharacterReplacement) { ...@@ -4528,6 +4528,11 @@ TEST_F(RenderTextTest, ControlCharacterReplacement) {
// Setting multiline, the newline character will be back to the original text. // Setting multiline, the newline character will be back to the original text.
render_text->SetMultiline(true); render_text->SetMultiline(true);
EXPECT_EQ(WideToUTF16(L"␈␍␇␉\n␋␌"), render_text->GetDisplayText()); EXPECT_EQ(WideToUTF16(L"␈␍␇␉\n␋␌"), render_text->GetDisplayText());
// The generic control characters should have been replaced by the replacement
// codepoints.
render_text->SetText(WideToUTF16(L"\u008f\u0080"));
EXPECT_EQ(WideToUTF16(L"\ufffd\ufffd"), render_text->GetDisplayText());
} }
TEST_F(RenderTextTest, PrivateUseCharacterReplacement) { TEST_F(RenderTextTest, PrivateUseCharacterReplacement) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment