IDN spoof checks: Refactor skeleton generation to support IDN in top domain list
Skeleton strings are used to detect confusable hostnames. They are generated in two places: 1. During build time in make_top_domain_skeletons: This binary takes a list of hostnames and generates their skeletons to be embedded statically to the Chrome binary. 2. During runtime idn_spoof_checker.cc: This class generates skeletons of a hostname to compare against a list of known skeleton strings. Before generating a skeleton string, this class applies a few additional transformations such as diacritic removal to the hostname to be able to detect more confusable hostnames. This CL extracts the skeleton generation in IDN spoof checker code to a separate file so that make_top_domain_skeletons can apply the same transformations to input hostnames when building the static top domain list. This CL also modifies top_domain_generator binary which generates the actual trie to be embedded to the binary. The trie currently doesn't allow non-ASCII characters in its fields. This CL stores unicode hostnames in punycode to overcome this restriction. Unicode hostnames may still have non-ASCII skeleton strings, and top_domain_generator still doesn't support that. However, the current top domain list doesn't have any IDN so this isn't a blocking issue. Bug: 1040607 Change-Id: I40c654152025d910cbeb8ba32bff5b7835f00104 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1992011Reviewed-by:Christopher Thompson <cthomp@chromium.org> Reviewed-by:
Mustafa Emre Acer <meacer@chromium.org> Commit-Queue: Mustafa Emre Acer <meacer@chromium.org> Cr-Commit-Position: refs/heads/master@{#729987}
Showing
Please register or sign in to comment