Change TranslateHelper to use a textContent-like text dump
Currently, TranslateHelper caputres page innerText to decide the page language. The text capturing is done in RenderFrame::DidMeaningfulLayout(), in which we normally have the clean layout, which is necessary for computing innerText. However, due to some layout bug [1], we don't always have clean layout at the call site. As fixing the layout bug is hard, this patch changes the text capture algorithm to return a string similar to textContent [2] instead, which doesn't require clean layout. The difference between the new and old text capture is subtle: - textContent may include invisible text nodes, while innerText never does that. However, this patch uses a slightly modified textContent that doesn't include text nodes in STYLE or SCRIPT elements. Other invisible text may still be included. - textContent is a simple concatenation of all text nodes. innerText does some "formatting" by inserting/deleting some whitespace characters, including: - Insertion of '\n' between blocks of text (e.g., between <div>) - Insertion of '\t' between table cells - This patch uses a custom dump algorithm that still collapses consecutive whitespaces; however, the collapsing happens regardless of style [1] crbug.com/803403 and crbug.com/585164. The crash happens in the wild, but we haven't even found a stable repro case yet. Bug: 803403 Change-Id: I7e108d368cbcaccbbb60582323a9e9e041d95269 Reviewed-on: https://chromium-review.googlesource.com/891582Reviewed-by:Takashi Toyoshima <toyoshim@chromium.org> Reviewed-by:
Kent Tamura <tkent@chromium.org> Reviewed-by:
Rachel Blum <groby@chromium.org> Commit-Queue: Xiaocheng Hu <xiaochengh@chromium.org> Cr-Commit-Position: refs/heads/master@{#536969}
Showing
Please register or sign in to comment