• meacer's avatar
    Allow whole-script confusable Cyrillic domains only on Cyrillic TLDs · b52ac904
    meacer authored
    A whole-script confusable Cyrillic domain consists of entirely Cyrillic
    characters that look identical to Latin characters (e.g. xn--80ak6aa92e[.]com
    decodes to аррӏе[.]com where аррӏе is in fact '\x0430\x0440\x0440\x04cf\x0435').
    
    A previous change allowed whole-script confusable Cyrillic characters on
    non-ASCII top level domains only. This means that xn--80ak6aa92e[.]com remains
    punycode (TLD is .com) but xn--80ak6aa92e[.]xn--p1ai is decoded as аррӏе[.]рф
    (TLD is Cyrillic). However, this also allows spoofs in other non-ASCII TLDs
    such as аррӏе[.]中国 so it's not a sufficient measure.
    
    This change further limits allowable whole-script confusable Cyrillic domains
    to Cyrillic TLDs (instead of non-ASCII) and a small list of additional TLDs
    containing a large number of Cyrillic domains (bg, by, kz, pyc, ru, su,
    ua, uz). The idea is that users familiar with Cyrillic are more likely
    to encounter these TLDs and notice any discrepancies in the displayed
    domain name.
    
    Bug: 968505
    Change-Id: Ib7462c9776f3640a5f60e5c79ac1a0c5d7b2028c
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1881887
    Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
    Reviewed-by: default avatarChristopher Thompson <cthomp@chromium.org>
    Reviewed-by: default avatarPeter Kasting <pkasting@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#712764}
    b52ac904
url_formatter.cc 29.9 KB