Commit 027f3bbb authored by Mike Frysinger's avatar Mike Frysinger Committed by Commit Bot

grit: c_format: rework how we build the escaped string

Python 3 doesn't support string_escape on bytes.  It doesn't have an
official supported form for this currently either.  It could be done
ad-hoc (iterate over every byte and build up the string ourselves),
but it turns out the codecs module has an escape_encode helper that
does what we need with a bit of effort.

Also cleanup the associated unittest while we're here.  It was mixing
bytes & string data inline and expecting it to be interpreted as UTF8
encoding.  Python 3 didn't like this at all.  Make it a proper unicode
string and use \u sequences to get the codepoints we want.

Bug: 983071
Test: `./grit/test_suite_all.py` passes
Change-Id: I1c4de0e3ab5c129d4999d6685887a2c54bd7ab9c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1755388
Commit-Queue: Mike Frysinger <vapier@chromium.org>
Reviewed-by: default avatarRobert Flack <flackr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#687323}
parent 81e83978
......@@ -7,6 +7,7 @@
from __future__ import print_function
import codecs
import os
import re
......@@ -65,10 +66,16 @@ def _FormatMessage(item, lang):
"""Format a single <message> element."""
message = item.ws_at_start + item.Translate(lang) + item.ws_at_end
# output message with non-ascii chars escaped as octal numbers
# C's grammar allows escaped hexadecimal numbers to be infinite,
# but octal is always of the form \OOO
message = message.encode('utf-8').encode('string_escape')
# Output message with non-ascii chars escaped as octal numbers C's grammar
# allows escaped hexadecimal numbers to be infinite, but octal is always of
# the form \OOO. Python 3 doesn't support string-escape, so we have to jump
# through some hoops here via codecs.escape_encode.
# This basically does:
# - message - the starting string
# - message.encode(...) - convert to bytes
# - codecs.escape_encode(...) - convert non-ASCII bytes to \x## escapes
# - (...).decode() - convert bytes back to a string
message = codecs.escape_encode(message.encode('utf-8'))[0].decode('utf-8')
# an escaped char is (\xHH)+ but only if the initial
# backslash is not escaped.
not_a_backslash = r"(^|[^\\])" # beginning of line or a non-backslash char
......
......@@ -24,7 +24,7 @@ from grit.tool import build
class CFormatUnittest(unittest.TestCase):
def testMessages(self):
root = util.ParseGrdForUnittest("""
root = util.ParseGrdForUnittest(u"""
<messages>
<message name="IDS_QUESTIONS">Do you want to play questions?</message>
<message name="IDS_QUOTES">
......@@ -36,7 +36,7 @@ No.
Statement. Two all. Game point.
</message>
<message name="IDS_NON_ASCII">
\xc3\xb5\\xc2\\xa4\\\xc2\xa4\\\\xc3\\xb5\xe4\xa4\xa4
\u00f5\\xc2\\xa4\\\u00a4\\\\xc3\\xb5\u4924
</message>
</messages>
""")
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment