Commit a7a26d22 authored by Ryan Harrison's avatar Ryan Harrison Committed by Commit Bot

Strip Zero Width Whitespace from PDFium text strings

When getting text from PDFium, the library does not filter ZWW
(0x200B), since it is a valid non-control character. It is ignorable
though, so the embedder aka Chrome, has the option of whether or not
to display this character. Given that it shouldn't have any visual
display, including it in the displayed text can lead to weird UI
situations. Like the length of text being longer then number of
characters displayed or navigating the cursor requires multiple key
presses to get over the ZWW.

BUG=chromium:743522

Change-Id: I5312a3aad4a752659fb4455853cd1030f0660bd9
Reviewed-on: https://chromium-review.googlesource.com/1210966Reviewed-by: default avatarHenrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#589271}
parent cbd64a18
......@@ -12,6 +12,8 @@ namespace chrome_pdf {
namespace {
constexpr base::char16 kZeroWidthWhitespace = 0x200B;
void AdjustForBackwardsRange(int* index, int* count) {
int& char_index = *index;
int& char_count = *count;
......@@ -105,6 +107,9 @@ base::string16 PDFiumRange::GetText() const {
api_string_adapter.Close(written);
}
// Strip ignorable non-displaying whitespace
rv.erase(std::remove(rv.begin(), rv.end(), kZeroWidthWhitespace), rv.end());
return rv;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment