Commit 17cbe8c8 authored by Mustafa Emre Acer's avatar Mustafa Emre Acer Committed by Commit Bot

Lookalike URLs: Do not trigger when the eTLD+1 of the navigated domain is not IDN

The top domain heuristic incorrectly marks certain domains as lookalikes, despite the fact that the domain itself is a top domain.
As a result of this bug, tést[.]blogspot[.]com displays a "Did you mean to go to blogspot.com".

This happens because:
- We currently use INCLUDE_PRIVATE_REGISTRIES flag when computing eTLD+1s via net::registry_controlled_domains::GetDomainAndRegistry.
- This results in the eTLD+1 of "tést[.]blogspot[.]com" being computed as "tést[.]blogspot[.]com" instead of just "blogspot[.]com".
- Since the eTLD+1 contains IDN, we perform the top domain skeleton check.
- This check compares "tést[.]blogspot[.]com" against top domains and finds "blogspot[.]com" as a match.
- This in turn records metrics and shows the UI.

This CL excludes private registries in eTLD+1 computation. As a result, eTLD+1 of private registries will exclude the subdomain, e.g. eTLD+1 of "tést[.]blogspot[.]com" will now be computed as "blogspot[.]com". Since the new eTLD+1 does not have IDN, it will not trigger any further checks and the navigation will not be treated as a lookalike.

Bug: 843361
Change-Id: Iaec1a0af61fa7aa9a2091bff3212171ba36034ad
Reviewed-on: https://chromium-review.googlesource.com/c/1387670Reviewed-by: default avatarTommy Li <tommycli@chromium.org>
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618367}
parent 370bd1fb
......@@ -48,6 +48,16 @@ bool SkeletonsMatch(const url_formatter::Skeletons& skeletons1,
return false;
}
// Returns the eTLD+1 of |hostname|. This excludes private registries so that
// GetETLDPlusOne("test.blogspot.com") returns "blogspot.com" (blogspot.com is
// listed as a private registry). We do this to be consistent with
// url_formatter's top domain list which doesn't have a notion of private
// registries.
std::string GetETLDPlusOne(const GURL& url) {
return net::registry_controlled_domains::GetDomainAndRegistry(
url, net::registry_controlled_domains::EXCLUDE_PRIVATE_REGISTRIES);
}
// Returns a site that the user has used before that the eTLD+1 in
// |domain_and_registry| may be attempting to spoof, based on skeleton
// comparison.
......@@ -78,10 +88,8 @@ std::string GetMatchingSiteEngagementDomain(
// If the user has engaged with eTLD+1 of this site, don't show any
// lookalike navigation suggestions.
const std::string engaged_domain_and_registry =
net::registry_controlled_domains::GetDomainAndRegistry(
detail.origin,
net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
// eTLD+1 can be empty for private domains.
GetETLDPlusOne(detail.origin);
// eTLD+1 can be empty for private domains (e.g. http://test).
if (engaged_domain_and_registry.empty())
continue;
......@@ -182,9 +190,7 @@ bool LookalikeUrlNavigationObserver::GetMatchingDomain(
std::string* matched_domain,
MatchType* match_type) {
// Perform all computations on eTLD+1.
const std::string domain_and_registry =
net::registry_controlled_domains::GetDomainAndRegistry(
url, net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
const std::string domain_and_registry = GetETLDPlusOne(url);
url_formatter::IDNConversionResult result =
url_formatter::IDNToUnicodeWithDetails(domain_and_registry);
......@@ -192,6 +198,14 @@ bool LookalikeUrlNavigationObserver::GetMatchingDomain(
// If the navigated domain is IDN, check its skeleton against top domains
// and engaged sites.
if (!result.matching_top_domain.empty()) {
// In practice, this is not possible since the top domain list does not
// contain IDNs, so domain_and_registry can't both have IDN and be a top
// domain. Still, sanity check in case the top domain list changes in the
// future.
if (domain_and_registry == result.matching_top_domain) {
// Navigated domain is a top domain, do nothing.
return false;
}
RecordEvent(NavigationSuggestionEvent::kMatchTopSite);
*matched_domain = result.matching_top_domain;
*match_type = MatchType::kTopSite;
......
......@@ -258,6 +258,24 @@ IN_PROC_BROWSER_TEST_P(LookalikeUrlNavigationObserverBrowserTest,
LookalikeUrlNavigationObserver::MatchType::kTopSite);
}
// The navigated domain itself is a top domain or a subdomain of a top domain.
// Should not record metrics. The top domain list doesn't contain any IDN, so
// this only tests the case where the subdomains are IDNs.
IN_PROC_BROWSER_TEST_P(LookalikeUrlNavigationObserverBrowserTest,
TopDomainIdnSubdomain_NoInfobar) {
TestInfobarNotShown(
embedded_test_server()->GetURL("tést.google.com", "/title1.html"));
CheckNoUkm();
// blogspot.com is a private registry, so the eTLD+1 of "tést.blogspot.com" is
// itself, instead of just "blogspot.com". This is different than
// tést.google.com whose eTLD+1 is google.com, and it should be handled
// correctly.
TestInfobarNotShown(
embedded_test_server()->GetURL("tést.blogspot.com", "/title1.html"));
CheckNoUkm();
}
// Navigate to a domain within an edit distance of 1 to a top domain.
// This should record metrics. It should also show a "Did you mean to go to ..."
// infobar if configured via a feature param.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment