Speedup ItemizeTextToRuns for really slow cases

The code for ItemizeTextToRuns is really slow on some corner cases. This CL is fixing the performance issue (see http://crbug.com/924915). The main issue is that each iteration of the ItemizeText loop will move forward by the smallest amount of characters (size of the run) and then the whole logic to find the breaking position is applied again. As an example, the following text: "\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!" * Computing the breaking point of "bidi direction" is O(n) * Computing the breaking point of "script" is O(n) Unfortunately, most runs will be split at the \n since it's a special character. This means the complexity of this code will be O(n^2). The proposed code in this CL is: 1) Splitting in block of same bidi direction 2) Splitting in sequence of characters with compatible script 3) Splitting for special characters / style This way, the 1) and 2) are only executed once (e.g. O(n) ) by character. NOTE: This bad pattern was happening often with emoji since they are often isolated in a run as a special characters. Bug: 924915, 1014119 Change-Id: I8d950e166ff01285ad0ce9796e38321ec502f368 Fixed: 2 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1865673Reviewed-by: Robert Liao <robliao@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#708014}

Speedup ItemizeTextToRuns for really slow cases
The code for ItemizeTextToRuns is really slow on some corner cases. This CL is fixing the performance issue (see http://crbug.com/924915). The main issue is that each iteration of the ItemizeText loop will move forward by the smallest amount of characters (size of the run) and then the whole logic to find the breaking position is applied again. As an example, the following text: "\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!\n!" * Computing the breaking point of "bidi direction" is O(n) * Computing the breaking point of "script" is O(n) Unfortunately, most runs will be split at the \n since it's a special character. This means the complexity of this code will be O(n^2). The proposed code in this CL is: 1) Splitting in block of same bidi direction 2) Splitting in sequence of characters with compatible script 3) Splitting for special characters / style This way, the 1) and 2) are only executed once (e.g. O(n) ) by character. NOTE: This bad pattern was happening often with emoji since they are often isolated in a run as a special characters. Bug: 924915, 1014119 Change-Id: I8d950e166ff01285ad0ce9796e38321ec502f368 Fixed: 2 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1865673Reviewed-by: Robert Liao <robliao@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Cr-Commit-Position: refs/heads/master@{#708014}
1e94b256 · Etienne Bergeron · Commit Bot · a9515913 · 1e94b256 · 1e94b256
Commit 1e94b256 authored Oct 21, 2019 by Etienne Bergeron Committed by Commit Bot Oct 21, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 122 additions and 47 deletions

ui/gfx/render_text_harfbuzz.cc ui/gfx/render_text_harfbuzz.cc +87 -47

ui/gfx/render_text_unittest.cc ui/gfx/render_text_unittest.cc +35 -0

No files found.
--- a/ui/gfx/render_text_harfbuzz.cc
+++ b/ui/gfx/render_text_harfbuzz.cc
@@ -704,6 +704,26 @@ void ApplyForcedDirection(UBiDiLevel* level) {
  }
 }

+internal::TextRunHarfBuzz::FontParams CreateFontParams(
+    const gfx::Font& primary_font,
+    UBiDiLevel bidi_level,
+    UScriptCode script,
+    const internal::StyleIterator& style) {
+  internal::TextRunHarfBuzz::FontParams font_params(primary_font);
+  font_params.italic = style.style(TEXT_STYLE_ITALIC);
+  font_params.baseline_type = style.baseline();
+  font_params.font_size = style.font_size_override();
+  font_params.strike = style.style(TEXT_STYLE_STRIKE);
+  font_params.underline = style.style(TEXT_STYLE_UNDERLINE);
+  font_params.heavy_underline = style.style(TEXT_STYLE_HEAVY_UNDERLINE);
+  font_params.weight = style.weight();
+  font_params.level = bidi_level;
+  font_params.script = script;
+  // Odd BiDi embedding levels correspond to RTL runs.
+  font_params.is_rtl = (font_params.level % 2) == 1;
+  return font_params;
+}
+
 }  // namespace

 namespace internal {
@@ -1910,54 +1930,74 @@ void RenderTextHarfBuzz::ItemizeTextToRuns(
  internal::StyleIterator style(empty_colors, baselines(),
                                font_size_overrides(), weights(), styles());

-  for (size_t run_break = 0; run_break < text.length();) {
-    Range run_range;
-    internal::TextRunHarfBuzz::FontParams font_params(primary_font);
-    run_range.set_start(run_break);
-    font_params.italic = style.style(TEXT_STYLE_ITALIC);
-    font_params.baseline_type = style.baseline();
-    font_params.font_size = style.font_size_override();
-    font_params.strike = style.style(TEXT_STYLE_STRIKE);
-    font_params.underline = style.style(TEXT_STYLE_UNDERLINE);
-    font_params.heavy_underline = style.style(TEXT_STYLE_HEAVY_UNDERLINE);
-    font_params.weight = style.weight();
-    int32_t script_item_break = 0;
-    bidi_iterator.GetLogicalRun(run_break, &script_item_break,
-                                &font_params.level);
-    CHECK_GT(static_cast<size_t>(script_item_break), run_break);
-    ApplyForcedDirection(&font_params.level);
-    // Odd BiDi embedding levels correspond to RTL runs.
-    font_params.is_rtl = (font_params.level % 2) == 1;
-    // Find the length and script of this script run.
-    script_item_break =
-        ScriptInterval(text, run_break, script_item_break - run_break,
-                       &font_params.script) +
-        run_break;
-
-    // Find the next break and advance the iterators as needed.
-    const size_t new_run_break = std::min(
-        static_cast<size_t>(script_item_break),
-        TextIndexToGivenTextIndex(text, style.GetRange().end()));
-    CHECK_GT(new_run_break, run_break)
-        << "It must proceed! " << text << " " << run_break;
-    run_break = new_run_break;
-
-    // Break runs at certain characters that need to be rendered separately to
-    // prevent either an unusual character from forcing a fallback font on the
-    // entire run, or brackets from being affected by a fallback font.
-    // http://crbug.com/278913, http://crbug.com/396776
-    if (run_break > run_range.start())
-      run_break = FindRunBreakingCharacter(text, run_range.start(), run_break);
-
-    DCHECK(IsValidCodePointIndex(text, run_break));
-    style.UpdatePosition(DisplayIndexToTextIndex(run_break));
-    run_range.set_end(run_break);
+  // Split the original text by logical runs, then each logical run by common
+  // script and each sequence at special characters and style boundaries. This
+  // invariant holds: bidi_run_start <= script_run_start <= breaking_run_start
+  // <= breaking_run_end <= script_run_end <= bidi_run_end
+  for (size_t bidi_run_start = 0; bidi_run_start < text.length();) {
+    // Determine the longest logical run (e.g. same bidi direction) from this
+    // point.
+    int32_t bidi_run_break = 0;
+    UBiDiLevel bidi_level = 0;
+    bidi_iterator.GetLogicalRun(bidi_run_start, &bidi_run_break, &bidi_level);
+    size_t bidi_run_end = static_cast<size_t>(bidi_run_break);
+    DCHECK_LT(bidi_run_start, bidi_run_end);
+
+    ApplyForcedDirection(&bidi_level);
+
+    for (size_t script_run_start = bidi_run_start;
+         script_run_start < bidi_run_end;) {
+      // Find the longest sequence of characters that have at least one common
+      // UScriptCode value.
+      UScriptCode script = USCRIPT_INVALID_CODE;
+      size_t script_run_end =
+          ScriptInterval(text, script_run_start,
+                         bidi_run_end - script_run_start, &script) +
+          script_run_start;
+      DCHECK_LT(script_run_start, script_run_end);
+
+      for (size_t breaking_run_start = script_run_start;
+           breaking_run_start < script_run_end;) {
+        // Break runs at certain characters that need to be rendered separately
+        // to prevent either an unusual character from forcing a fallback font
+        // on the entire run, or brackets from being affected by a fallback
+        // font. http://crbug.com/278913, http://crbug.com/396776
+        size_t breaking_char_end =
+            FindRunBreakingCharacter(text, breaking_run_start, script_run_end);
+
+        // Break runs at style boundaries.
+        style.UpdatePosition(DisplayIndexToTextIndex(breaking_run_start));
+        size_t text_style_end =
+            TextIndexToGivenTextIndex(text, style.GetRange().end());
+
+        // The next break is the nearest break position.
+        const size_t breaking_run_end =
+            std::min(breaking_char_end, text_style_end);
+        DCHECK_LT(breaking_run_start, breaking_run_end);
+        DCHECK(IsValidCodePointIndex(text, breaking_run_end));
+
+        // Set the font params for the current run for the current run break.
+        internal::TextRunHarfBuzz::FontParams font_params =
+            CreateFontParams(primary_font, bidi_level, script, style);
+
+        // Create the current run from [breaking_run_start, breaking_run_end[.
+        auto run = std::make_unique<internal::TextRunHarfBuzz>(primary_font);
+        run->range = Range(breaking_run_start, breaking_run_end);
+
+        // Add the created run to the set of runs.
+        (*out_commonized_run_map)[font_params].push_back(run.get());
+        out_run_list->Add(std::move(run));
+
+        // Move to the next run.
+        breaking_run_start = breaking_run_end;
+      }

-    auto run = std::make_unique<internal::TextRunHarfBuzz>(
-        font_list().GetPrimaryFont());
-    (*out_commonized_run_map)[font_params].push_back(run.get());
-    run->range = run_range;
-    out_run_list->Add(std::move(run));
+      // Move to the next script sequence.
+      script_run_start = script_run_end;
+    }
+
+    // Move to the next direction sequence.
+    bidi_run_start = bidi_run_end;
  }

  // Add trace event to track incorrect usage of fallback fonts.

--- a/ui/gfx/render_text_unittest.cc
+++ b/ui/gfx/render_text_unittest.cc
@@ -906,6 +906,41 @@ TEST_F(RenderTextTest, ObscuredEmoji) {
  render_text->Draw(canvas());
 }

+TEST_F(RenderTextTest, ItemizeTextToRuns) {
+  struct {
+    const wchar_t* text;
+    const char* expected_structure;
+  } cases[] = {
+      {L"abc", "[0->2]"},
+      {L"ښڛڜ", "[2<-0]"},
+      {L"abcښڛڜdef", "[0->2][5<-3][6->8]"},
+      {L"abcऔकखdefڜ", "[0->2][3->5][6->8][9]"},
+      {L"1-(800)-xxx-xxxx", "[0->1][2][3->5][6][7][8->10][11][12->15]"},
+      {L"क\u200Bख", "[0][1][2]"},
+      {L"1 2 3 4", "[0->6]"},
+      {L"1\u200C2\u200C3\u200C4", "[0][1][2][3][4][5][6]"},
+      {L"a\u0300e\u0301", "[0->3]"},
+      {L"\u0065\u0308\u0435\u0308", "[0->1][2->3]"},
+      {L"☞☛test☚☜", "[0->1][2->5][6->7]"},
+      {L"☺☺☺!", "[0->2][3]"},
+      {L"☺☺☺ښ", "[3][2<-0]"},
+      {L"(☾☹☽)", "[0][1->3][4]"},
+      {L"\U0001F6281234", "[0->1][2->5]"},      // http://crbug.com/530021
+      {L"▶Feel goods", "[0][1->4][5][6->10]"},  // http://crbug.com/278913
+      {L"ぬ「シ」ほ", "[0][1][2][3][4]"},       // http://crbug.com/396776
+      {L"國哲(c)1", "[0->1][2][3][4][5]"},      // http://crbug.com/125792
+  };
+
+  for (const auto& test : cases) {
+    SCOPED_TRACE(base::StringPrintf("Testing cases '%ls' -> '%s'", test.text,
+                                    test.expected_structure));
+    RenderTextHarfBuzz* render_text = GetRenderText();
+    render_text->SetText(WideToUTF16(test.text));
+    test_api()->EnsureLayout();
+    EXPECT_EQ(test.expected_structure, GetRunListStructureString());
+  }
+}
+
 TEST_F(RenderTextTest, ElidedText) {
  // TODO(skanuj) : Add more test cases for following
  // - RenderText styles.