Commit 24c982a7 authored by Torne (Richard Coles)'s avatar Torne (Richard Coles) Committed by Commit Bot

Work around bad XML output by dexdump.

The version of dexdump in build-tools 30.0.1 includes more information
than previous versions in its output, which has revealed that it doesn't
encode its output as valid XML in all cases; Java strings with control
characters or nulls end up emitted literally in the XML output, which
breaks ElementTree's parser. b/161925303 has been filed internally to
track fixing this in dexdump itself.

Since we only need to be able to extract a few specific things from the
dexdump output in our tooling, just replace invalid characters in the
output with the Unicode replacement character (as Python does when using
the 'replace' error handler in encoding) before parsing it.

Bug: 1106471
Change-Id: Id1c3a40e5ce91125dbee9fdf1923383d02314f55
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2314986Reviewed-by: default avatarAndrew Grieve <agrieve@chromium.org>
Auto-Submit: Richard Coles <torne@chromium.org>
Commit-Queue: Richard Coles <torne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#791232}
parent 59d5c5ba
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
# found in the LICENSE file. # found in the LICENSE file.
import os import os
import re
import shutil import shutil
import tempfile import tempfile
from xml.etree import ElementTree from xml.etree import ElementTree
...@@ -37,7 +38,16 @@ def Dump(apk_path): ...@@ -37,7 +38,16 @@ def Dump(apk_path):
cmd_helper.RunCmd(['unzip', apk_path, 'classes.dex'], cwd=dexfile_dir) cmd_helper.RunCmd(['unzip', apk_path, 'classes.dex'], cwd=dexfile_dir)
dexfile = os.path.join(dexfile_dir, 'classes.dex') dexfile = os.path.join(dexfile_dir, 'classes.dex')
output_xml = cmd_helper.GetCmdOutput([DEXDUMP_PATH, '-l', 'xml', dexfile]) output_xml = cmd_helper.GetCmdOutput([DEXDUMP_PATH, '-l', 'xml', dexfile])
return _ParseRootNode(ElementTree.fromstring(output_xml)) # Dexdump doesn't escape its XML output very well; decode it as utf-8 with
# invalid sequences replaced, then remove forbidden characters and
# re-encode it (as etree expects a byte string as input so it can figure
# out the encoding itself from the XML declaration)
BAD_XML_CHARS = re.compile(
u'[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f-\x84\x86-\x9f' +
u'\ud800-\udfff\ufdd0-\ufddf\ufffe-\uffff]')
decoded_xml = output_xml.decode('utf-8', 'replace')
clean_xml = BAD_XML_CHARS.sub(u'\ufffd', decoded_xml)
return _ParseRootNode(ElementTree.fromstring(clean_xml.encode('utf-8')))
finally: finally:
shutil.rmtree(dexfile_dir) shutil.rmtree(dexfile_dir)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment