Fix RenderText itemization for complex emoji
This CL is a refactoring on the way RenderText do ItemizeTextToRuns for the corner cases where a run should be split in grapheme clusters. The previous code was naming these cases: * Unusual characters * Special characters * Non-regular characters But they should be more specific on the purpose of splitting the runs. Also, the algorithm used for splitting a sequence of codepoint was based on the code block of the corresponding codepoints and few cases where trying to merge adjacent codepoint in a grapheme. The algorithm was incorrect multiple cases (e.g. emoji). Unicode provide a way to split a sequence of codepoints into grapheme and grapheme clusters. They provide a state machine which is using the codepoint proporties to decide if the current location is a grapheme boundaries. The ICU library is providing an API over the character properties to help iterating over graphemes. * ubrk_open * ubrk_first * ubrk_next * ubrk_close The class base::i18n::BreakIterator(..., BREAK_CHARACTER) is providing an easy to use wrapper over that API. The current CL is replacing the previous characters based splitting algorithm by the graphemes based version. See emoji sequence: http://www.unicode.org/reports/tr51/ 1.4.5 Emoji Sequences The full emoji list: * http://unicode.org/emoji/charts/full-emoji-list.html Emoji data, used to make our unittests: * http://www.unicode.org/Public/emoji/12.0/emoji-data.txt see: UNICODE TEXT SEGMENTATION (http://unicode.org/reports/tr29/) see: https://cs.chromium.org/chromium/src/third_party/icu/source/common/unicode/ubrk.h see: https://cs.chromium.org/chromium/src/base/i18n/break_iterator.h Change-Id: I6b9a9c79021f2ce0e2db7cdefdd0838b5911f445 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1788804 Commit-Queue: Etienne Bergeron <etienneb@chromium.org> Reviewed-by:Alexei Svitkine <asvitkine@chromium.org> Reviewed-by:
Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#712704}
Showing
This diff is collapsed.
Please register or sign in to comment