• Daniel Cheng's avatar
    Fix Mojo backcompat check to handle UTF-8 encoded files (take 2). · e555e608
    Daniel Cheng authored
    https://crrev.com/c/2223973 attempted to fix this check to handle UTF-8
    encoded files. However, it appears that Python2's open() will sometimes
    return a bytestring for a file containing UTF-8 encoded text. encode()
    will then assume the bytestring is encoded using the default 'ascii'
    encoding, which then explodes upon seeing any byte with a high byte set
    (e.g. any character that's not representable in 7-bit ASCII).
    
    All mojom files in Chrome should be encoded as UTF-8. Since Python 2.7,
    io.open() has identical functionality to Python 3's open(), so by using
    that and specifying the encoding argument, the UTF-8 assumption can be
    explicitly encoded. This makes it safe to unconditionally call encode()
    to generate a bytestring to pass into the Mojo parser.
    
    Bonus: this CL adds some additional gymanstics for exceptions to make it
    easier to find the parse failure by using six.reraise().
    
    Note: the original issue discussed in the CL description was fixed in a
    different way by https://crrev.com/c/2225327, which uses codecs.open().
    However, Python 3's open() is an alias for io.open(), which has somewhat
    different behavior from codecs.open().
    
    Change-Id: I86be3c5bc97c78078fe212162eb88e519180c3b0
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2226093Reviewed-by: default avatarKen Rockot <rockot@google.com>
    Reviewed-by: default avatarFergal Daly <fergal@chromium.org>
    Commit-Queue: Daniel Cheng <dcheng@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#774937}
    e555e608
check_stable_mojom_compatibility.py 5.93 KB