• jsbell's avatar
    UTF-16 Decoder: Convert unpaired surrogates to replacement characters · 9158f6d5
    jsbell authored
    The decoder blithely passed any old 16-bit code unit through, in
    violation of the Encoding standard. Surrogate pairs should go through
    unscathed:
    
      [ ... 0xD800 0xDC00 ... ] => [ ... U+D800 U+DC00 ... ]
    
    But cases like these should result in replacement characters:
    
      [ ... 0xD800 ... ] => [ ... U+FFFD ... ]
      [ ... 0xDC00 ... ] => [ ... U+FFFD ... ]
      [ ... 0xDC00 0xD800 ... ] => [ ... U+FFFD U+FFFD ... ]
    
    This aligns Chrome's behavior with Firefox and Edge.
    
    Note that flushing at the end of a stream remains a special case.
    Streams terminating in the above sequences will not get replacements
    emitted (current behavior). In addition, a lead surrogate appearing at
    the end of a stream will now not be emitted, matching other browsers.
    
    BUG=368904
    R=jshin@chromium.org,foolip@chromium.org
    
    Review-Url: https://codereview.chromium.org/2379333003
    Cr-Commit-Position: refs/heads/master@{#422929}
    9158f6d5
char-decoding-truncated.html 1.11 KB