Fix Mojo backcompat check to handle UTF-8 encoded files (take 2).

https://crrev.com/c/2223973 attempted to fix this check to handle UTF-8 encoded files. However, it appears that Python2's open() will sometimes return a bytestring for a file containing UTF-8 encoded text. encode() will then assume the bytestring is encoded using the default 'ascii' encoding, which then explodes upon seeing any byte with a high byte set (e.g. any character that's not representable in 7-bit ASCII). All mojom files in Chrome should be encoded as UTF-8. Since Python 2.7, io.open() has identical functionality to Python 3's open(), so by using that and specifying the encoding argument, the UTF-8 assumption can be explicitly encoded. This makes it safe to unconditionally call encode() to generate a bytestring to pass into the Mojo parser. Bonus: this CL adds some additional gymanstics for exceptions to make it easier to find the parse failure by using six.reraise(). Note: the original issue discussed in the CL description was fixed in a different way by https://crrev.com/c/2225327, which uses codecs.open(). However, Python 3's open() is an alias for io.open(), which has somewhat different behavior from codecs.open(). Change-Id: I86be3c5bc97c78078fe212162eb88e519180c3b0 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2226093Reviewed-by: Ken Rockot <rockot@google.com> Reviewed-by: Fergal Daly <fergal@chromium.org> Commit-Queue: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#774937}

Fix Mojo backcompat check to handle UTF-8 encoded files (take 2).
https://crrev.com/c/2223973 attempted to fix this check to handle UTF-8 encoded files. However, it appears that Python2's open() will sometimes return a bytestring for a file containing UTF-8 encoded text. encode() will then assume the bytestring is encoded using the default 'ascii' encoding, which then explodes upon seeing any byte with a high byte set (e.g. any character that's not representable in 7-bit ASCII). All mojom files in Chrome should be encoded as UTF-8. Since Python 2.7, io.open() has identical functionality to Python 3's open(), so by using that and specifying the encoding argument, the UTF-8 assumption can be explicitly encoded. This makes it safe to unconditionally call encode() to generate a bytestring to pass into the Mojo parser. Bonus: this CL adds some additional gymanstics for exceptions to make it easier to find the parse failure by using six.reraise(). Note: the original issue discussed in the CL description was fixed in a different way by https://crrev.com/c/2225327, which uses codecs.open(). However, Python 3's open() is an alias for io.open(), which has somewhat different behavior from codecs.open(). Change-Id: I86be3c5bc97c78078fe212162eb88e519180c3b0 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2226093Reviewed-by: Ken Rockot <rockot@google.com> Reviewed-by: Fergal Daly <fergal@chromium.org> Commit-Queue: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#774937}
e555e608 · Daniel Cheng · Commit Bot · e4627b85 · e555e608
Commit e555e608 authored Jun 04, 2020 by Daniel Cheng Committed by Commit Bot Jun 04, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 5 deletions

mojo/public/tools/mojom/check_stable_mojom_compatibility.py mojo/public/tools/mojom/check_stable_mojom_compatibility.py +16 -5

No files found.
--- a/mojo/public/tools/mojom/check_stable_mojom_compatibility.py
+++ b/mojo/public/tools/mojom/check_stable_mojom_compatibility.py
@@ -12,12 +12,13 @@ This can be used e.g. by a presubmit check to prevent developers from making
 breaking changes to stable mojoms."""
 import argparse
-import codecs
 import errno
+import io
 import json
 import os
 import os.path
 import shutil
+import six
 import sys
 import tempfile
@@ -26,6 +27,10 @@ from mojom.generate import translate
 from mojom.parse import parser
+class ParseError(Exception):
+  pass
 def _ValidateDelta(root, delta):
  """Parses all modified mojoms (including all transitive mojom dependencies,
  even if unmodified) to perform backward-compatibility checks on any types
@@ -61,10 +66,16 @@ def _ValidateDelta(root, delta):
      modules = override_modules
    else:
      modules = unmodified_modules
-      with codecs.open(os.path.join(root, mojom), encoding='utf-8') as f:
+      with io.open(os.path.join(root, mojom), encoding='utf-8') as f:
-        contents = ''.join(f.readlines())
+        contents = f.read()
-    ast = parser.Parse(contents, mojom)
+    try:
+      ast = parser.Parse(contents, mojom)
+    except Exception as e:
+      six.reraise(
+          ParseError,
+          'encountered exception {0} while parsing {1}'.format(e, mojom),
+          sys.exc_info()[2])
    for imp in ast.import_list:
      parseMojom(imp.import_filename, file_overrides, override_modules)