Use a huffman trie for top domain storage in url formatter
UrlFormatter uses ICU's spoof checker to determine lookalike domains that contain unicode confusables. It does this by extracting a skeleton string from the given domain representing its visual appearance. For example, google.com and googlé[.]com have the same skeleton string (google.corn). In addition to this, we want to display a "Did you mean to go to..." UI for navigations involving IDN if the domain name matches a top 10K domain. In order to do that, we need to store the domains associated with ICU skeletons. UrlFormatter currently uses a DAFSA to store the list of the skeletons of the top 10K domains. It doesn't and cannot store the actual domain in this list. To support this, this CL changes the underlying storage from DAFSA to the Huffman Trie used by net's preload list code. It - Generates the huffman trie from top domain list during compile time. - Decodes the huffman trie during runtime in IDNSpoofChecker::SimilarToTopDomains. The design doc for the preload list migration is here: https://docs.google.com/document/d/11rqIozUDaK6DvNeu436SL3Coj65J5vhD9HVftOi-RrA/edit As mentioned in the doc, micro benchmarks indicate that the binary size and speed is minimally impacted by this change (51KB additional size, 4 microseconds of additional time for each lookup). Bug: 843361 Change-Id: If98b8161bf836fec6ba74e68587bd2159f4eb3d5 Reviewed-on: https://chromium-review.googlesource.com/1106539 Commit-Queue: Mustafa Emre Acer <meacer@chromium.org> Reviewed-by:Nick Harper <nharper@chromium.org> Reviewed-by:
Peter Kasting <pkasting@chromium.org> Cr-Commit-Position: refs/heads/master@{#572055}
Showing
This diff is collapsed.
Please register or sign in to comment