Fix Mojo backcompat check to handle UTF-8 encoded files (take 2).
https://crrev.com/c/2223973 attempted to fix this check to handle UTF-8 encoded files. However, it appears that Python2's open() will sometimes return a bytestring for a file containing UTF-8 encoded text. encode() will then assume the bytestring is encoded using the default 'ascii' encoding, which then explodes upon seeing any byte with a high byte set (e.g. any character that's not representable in 7-bit ASCII). All mojom files in Chrome should be encoded as UTF-8. Since Python 2.7, io.open() has identical functionality to Python 3's open(), so by using that and specifying the encoding argument, the UTF-8 assumption can be explicitly encoded. This makes it safe to unconditionally call encode() to generate a bytestring to pass into the Mojo parser. Bonus: this CL adds some additional gymanstics for exceptions to make it easier to find the parse failure by using six.reraise(). Note: the original issue discussed in the CL description was fixed in a different way by https://crrev.com/c/2225327, which uses codecs.open(). However, Python 3's open() is an alias for io.open(), which has somewhat different behavior from codecs.open(). Change-Id: I86be3c5bc97c78078fe212162eb88e519180c3b0 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2226093Reviewed-by:Ken Rockot <rockot@google.com> Reviewed-by:
Fergal Daly <fergal@chromium.org> Commit-Queue: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#774937}
Showing
Please register or sign in to comment