Commit 3d9c4ea1 authored by manuk's avatar manuk Committed by Commit Bot

[omnibox]: Dedupe bookmark provider classification.

This is the 15th refactoring CL aimed at reducing duplication and
inconsistency for classifying omnibox results.

Bookmark providers compare the user and suggest texts and find the
corresponding matches during construction in
'TitledUrlIndex::AddMatchToResults'. These are then used for both
scoring and classification.

With this CL, bookmark classification uses the 'FindTermMatches' and
'ClassifyTermMatches' methods that other providers use. This improves
consistency with other providers at the cost of inconsistency with the
bookmark provider's scoring. The most prominent distinction is prefix
matching; where if the user input is an exact prefix of the suggestion
text, subsequent matching words in the suggestion text are not bolded.
E.g., before this CL, the user text 'the' would match the bookmark
suggestion '[the] cake ate [the] moon'; whereas with this CL, it will
match only the first occurrence of 'the', '[the] cake ate the moon'.

Re consistency with other providers: A user may see suggestions from
different providers with similar texts. E.g., the user input 'the' could
provide both search and bookmark suggestions with texts 'the cake ate
the moon'. It would be surprising if such suggestions with the same text
were bolded differently; e.g. the search suggestion were bolded '[the]
cake ate the moon', whereas the bookmark suggestion was bolded '[the]
cake ate [the] moon'.

Re inconsistency with bookmark provider's scoring: Bookmark scoring does
not consider prefixes specially and all occurrences of the user input
contribute to the score. Not bolding subsequent occurrences suggests
otherwise. E.g., the user input 'the' could display two bookmark
suggestions '[the] cake ate the moon' and '[the] cake ate a moon', and
the first would be scored higher because it contains a second occurrence
of 'the'.

In addition to a change in prefix bolding this CL has some other minor
side affects on bolding regarding word-separators and matching input
words shorter than 3 characters incompletely with suggestion words.
E.g., the user input 'the' (3 characters) previously matched '[the]' and
'[the]re', but 'th' (shorter than 3 characters) only matched '[th]'
(complete match) but not 'the' or 'there' (incomplete matches).

Bug: 366623
Change-Id: Ifc08aae00359be2669b05189e249d531ed3dabf1
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1588554
Commit-Queue: manuk hovanesian <manukh@chromium.org>
Reviewed-by: default avatarTommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#664441}
parent 1cf899e9
......@@ -232,43 +232,37 @@ TEST_F(BookmarkProviderTest, Positions) {
// elements in the following |positions| array.
const size_t positions[99][9][2];
} query_data[] = {
// This first set is primarily for position detection validation.
{"abc", 3, {{{0, 3}, {0, 0}},
{{0, 3}, {0, 0}},
{{0, 3}, {0, 0}}}},
{"abcde", 2, {{{0, 5}, {0, 0}},
{{0, 5}, {0, 0}}}},
{"foo bar", 0, {{{0, 0}}}},
{"fooey bark", 0, {{{0, 0}}}},
{"def", 2, {{{2, 5}, {0, 0}},
{{4, 7}, {0, 0}}}},
{"ghi jkl", 2, {{{0, 3}, {4, 7}, {0, 0}},
{{0, 3}, {4, 7}, {0, 0}}}},
// NB: GetBookmarksMatching(...) uses exact match for "a" in title or URL.
{"a", 2, {{{0, 1}, {0, 0}},
{{0, 0}}}},
{"a d", 0, {{{0, 0}}}},
{"carry carbon", 1, {{{0, 5}, {6, 12}, {0, 0}}}},
// NB: GetBookmarksMatching(...) sorts the match positions.
{"carbon carry", 1, {{{0, 5}, {6, 12}, {0, 0}}}},
{"arbon", 0, {{{0, 0}}}},
{"ar", 0, {{{0, 0}}}},
{"arry", 0, {{{0, 0}}}},
// Quoted terms are single terms.
{"\"carry carbon\"", 1, {{{0, 12}, {0, 0}}}},
{"\"carry carbon\" care", 1, {{{0, 12}, {13, 17}, {0, 0}}}},
// Quoted terms require complete word matches.
{"\"carry carbo\"", 0, {{{0, 0}}}},
// This set uses duplicated and/or overlaps search terms in the title.
{"frank", 1, {{{0, 5}, {8, 13}, {16, 21}, {0, 0}}}},
{"frankly", 1, {{{0, 7}, {8, 15}, {0, 0}}}},
{"frankly frankly", 1, {{{0, 7}, {8, 15}, {0, 0}}}},
{"foobar foo", 1, {{{0, 6}, {7, 13}, {0, 0}}}},
{"foo foobar", 1, {{{0, 6}, {7, 13}, {0, 0}}}},
// This ensures that leading whitespace in the title is correctly offset.
{"hello", 1, {{{0, 5}, {8, 13}, {0, 0}}}},
// This ensures that empty titles yield empty classifications.
{"emptytitle", 1, {}},
// This first set is primarily for position detection validation.
{"abc", 3, {{{0, 3}, {0, 0}}, {{0, 3}, {0, 0}}, {{0, 3}, {0, 0}}}},
{"abcde", 2, {{{0, 5}, {0, 0}}, {{0, 5}, {0, 0}}}},
{"foo bar", 0, {{{0, 0}}}},
{"fooey bark", 0, {{{0, 0}}}},
{"def", 2, {{{2, 5}, {0, 0}}, {{4, 7}, {0, 0}}}},
{"ghi jkl", 2, {{{0, 7}, {0, 0}}, {{0, 3}, {4, 7}, {0, 0}}}},
// NB: GetBookmarksMatching(...) uses exact match for "a" in title or URL.
{"a", 2, {{{0, 1}, {0, 0}}, {{0, 0}}}},
{"a d", 0, {{{0, 0}}}},
{"carry carbon", 1, {{{0, 12}, {0, 0}}}},
// NB: GetBookmarksMatching(...) sorts the match positions.
{"carbon carry", 1, {{{0, 5}, {6, 12}, {0, 0}}}},
{"arbon", 0, {{{0, 0}}}},
{"ar", 0, {{{0, 0}}}},
{"arry", 0, {{{0, 0}}}},
// Quoted terms are single terms.
{"\"carry carbon\"", 1, {{{0, 5}, {6, 12}, {0, 0}}}},
{"\"carry carbon\" care", 1, {{{0, 5}, {6, 12}, {13, 17}, {0, 0}}}},
// Quoted terms require complete word matches.
{"\"carry carbo\"", 0, {{{0, 0}}}},
// This set uses duplicated and/or overlaps search terms in the title.
{"frank", 1, {{{0, 5}, {0, 0}}}},
{"frankly", 1, {{{0, 7}, {0, 0}}}},
{"frankly frankly", 1, {{{0, 15}, {0, 0}}}},
{"foobar foo", 1, {{{0, 10}, {0, 0}}}},
{"foo foobar", 1, {{{0, 6}, {7, 13}, {0, 0}}}},
// This ensures that leading whitespace in the title is correctly offset.
{"hello", 1, {{{0, 5}, {0, 0}}}},
// This ensures that empty titles yield empty classifications.
{"emptytitle", 1, {}},
};
for (size_t i = 0; i < base::size(query_data); ++i) {
......@@ -443,7 +437,7 @@ TEST_F(BookmarkProviderTest, StripHttpAndAdjustOffsets) {
{ "http blah", "http://blah.com", "0:3,4:1,7:3,11:1" },
{ "dom", "domain.com/http/", "0:3,3:1" },
{ "dom http", "http://domain.com/http/", "0:3,4:1,7:3,10:1,18:3,22:1" },
{ "rep", "repeat.com/1/repeat/2/", "0:3,3:1,13:3,16:1" },
{ "rep", "repeat.com/1/repeat/2/", "0:3,3:1" },
{ "versi", "chrome://version", "0:1,9:3,14:1" },
// clang-format on
};
......@@ -457,6 +451,15 @@ TEST_F(BookmarkProviderTest, StripHttpAndAdjustOffsets) {
const ACMatches& matches(provider_->matches());
ASSERT_EQ(1U, matches.size()) << description;
const AutocompleteMatch& match = matches[0];
description +=
"\n EXPECTED classes: " + query_data[i].expected_contents_class;
description += "\n ACTUAL classes: ";
for (auto x : match.contents_class) {
description += base::NumberToString(x.offset) + ":" +
base::NumberToString(x.style) + ",";
}
EXPECT_EQ(base::ASCIIToUTF16(query_data[i].expected_contents),
match.contents) << description;
std::vector<std::string> class_strings = base::SplitString(
......
......@@ -10,28 +10,12 @@
#include "base/strings/utf_string_conversions.h"
#include "components/bookmarks/browser/titled_url_node.h"
#include "components/omnibox/browser/autocomplete_match.h"
#include "components/omnibox/browser/autocomplete_match_classification.h"
#include "components/omnibox/browser/history_provider.h"
#include "components/omnibox/browser/in_memory_url_index_types.h"
#include "components/omnibox/browser/url_prefix.h"
#include "components/url_formatter/url_formatter.h"
namespace {
TermMatches OffsetsToTermMatches(const std::vector<size_t> offsets) {
TermMatches term_matches;
for (auto offset_iter = offsets.begin(); offset_iter != offsets.end();
++offset_iter) {
const size_t begin = *offset_iter;
++offset_iter;
const size_t end = *offset_iter;
if ((begin != base::string16::npos) && (end != base::string16::npos))
term_matches.emplace_back(0, begin, end - begin);
}
return term_matches;
}
} // namespace
namespace bookmarks {
AutocompleteMatch TitledUrlMatchToAutocompleteMatch(
......@@ -43,16 +27,12 @@ AutocompleteMatch TitledUrlMatchToAutocompleteMatch(
const AutocompleteInput& input,
const base::string16& fixed_up_input_text) {
const GURL& url = titled_url_match.node->GetTitledUrlNodeUrl();
base::string16 title = titled_url_match.node->GetTitledUrlNodeTitle();
// The AutocompleteMatch we construct is non-deletable because the only way to
// support this would be to delete the underlying object that created the
// titled_url_match. E.g., for the bookmark provider this would mean deleting
// the underlying bookmark, which is unlikely to be what the user intends.
AutocompleteMatch match(provider, relevance, false, type);
TitledUrlMatch::MatchPositions new_title_match_positions =
titled_url_match.title_match_positions;
CorrectTitleAndMatchPositions(&title, &new_title_match_positions);
const base::string16& url_utf16 = base::UTF8ToUTF16(url.spec());
match.destination_url = url;
......@@ -64,13 +44,24 @@ AutocompleteMatch TitledUrlMatchToAutocompleteMatch(
auto format_types = AutocompleteMatch::GetFormatTypes(
input.parts().scheme.len > 0 || match_in_scheme, match_in_subdomain);
std::vector<size_t> offsets = TitledUrlMatch::OffsetsFromMatchPositions(
titled_url_match.url_match_positions);
match.contents = url_formatter::FormatUrlWithOffsets(
url, format_types, net::UnescapeRule::SPACES, nullptr, nullptr, &offsets);
TermMatches url_term_matches = OffsetsToTermMatches(offsets);
match.contents_class = HistoryProvider::SpansFromTermMatch(
url_term_matches, match.contents.size(), true);
match.contents = url_formatter::FormatUrl(
url, format_types, net::UnescapeRule::SPACES, nullptr, nullptr, nullptr);
// Bookmark classification diverges from relevance scoring. Specifically,
// 1) All occurrences of the input contribute to relevance; e.g. for the input
// 'pre', the bookmark 'pre prefix' will be scored higher than 'pre suffix'.
// For classification though, if the input is a prefix of the suggestion text,
// only the prefix will be bolded; e.g. the 1st bookmark will display '[pre]
// prefix' as opposed to '[pre] [pre]fix'. This divergence allows consistency
// with other providers' and google.com's bolding.
// 2) Non-complete-word matches less than 3 characters long do not contribute
// to relevance; e.g. for the input 'a pr', the bookmark 'a pr prefix' will be
// scored the same as 'a pr suffix'. For classification though, both
// occurrences will be bolded, 'a [pr] [pr]efix'.
auto contents_terms = FindTermMatches(input.text(), match.contents);
match.contents_class = ClassifyTermMatches(
contents_terms, match.contents.length(),
ACMatchClassification::MATCH | ACMatchClassification::URL,
ACMatchClassification::URL);
// The inline_autocomplete_offset should be adjusted based on the formatting
// applied to |fill_into_edit|.
......@@ -93,29 +84,15 @@ AutocompleteMatch TitledUrlMatchToAutocompleteMatch(
match.inline_autocompletion.empty() ||
!HistoryProvider::PreventInlineAutocomplete(input);
}
match.description = title;
offsets =
TitledUrlMatch::OffsetsFromMatchPositions(new_title_match_positions);
TermMatches title_term_matches = OffsetsToTermMatches(offsets);
match.description_class = HistoryProvider::SpansFromTermMatch(
title_term_matches, match.description.size(), false);
match.description = titled_url_match.node->GetTitledUrlNodeTitle();
base::TrimWhitespace(match.description, base::TRIM_LEADING,
&match.description);
auto description_terms = FindTermMatches(input.text(), match.description);
match.description_class = ClassifyTermMatches(
description_terms, match.description.length(),
ACMatchClassification::MATCH, ACMatchClassification::NONE);
return match;
}
void CorrectTitleAndMatchPositions(
base::string16* title,
TitledUrlMatch::MatchPositions* title_match_positions) {
size_t leading_whitespace_chars = title->length();
base::TrimWhitespace(*title, base::TRIM_LEADING, title);
leading_whitespace_chars -= title->length();
if (leading_whitespace_chars == 0)
return;
for (auto it = title_match_positions->begin();
it != title_match_positions->end(); ++it) {
it->first -= leading_whitespace_chars;
it->second -= leading_whitespace_chars;
}
}
} // namespace bookmarks
......@@ -30,13 +30,6 @@ AutocompleteMatch TitledUrlMatchToAutocompleteMatch(
const AutocompleteInput& input,
const base::string16& fixed_up_input_text);
// Removes leading spaces from |title| before displaying, otherwise it looks
// funny. In the process, corrects |title_match_positions| so the correct
// characters are highlighted.
void CorrectTitleAndMatchPositions(
base::string16* title,
TitledUrlMatch::MatchPositions* title_match_positions);
} // namespace bookmarks
#endif // COMPONENTS_OMNIBOX_BROWSER_TITLED_URL_MATCH_UTILS_H_
......@@ -6,6 +6,7 @@
#include <memory>
#include "base/strings/string_number_conversions.h"
#include "base/strings/utf_string_conversions.h"
#include "base/test/scoped_feature_list.h"
#include "components/bookmarks/browser/titled_url_match.h"
......@@ -20,7 +21,6 @@
#include "url/gurl.h"
using bookmarks::TitledUrlMatchToAutocompleteMatch;
using bookmarks::CorrectTitleAndMatchPositions;
namespace {
......@@ -51,6 +51,18 @@ class MockTitledUrlNode : public bookmarks::TitledUrlNode {
GURL url_;
};
std::string ACMatchClassificationsAsString(
const ACMatchClassifications& clasifications) {
std::string position_string("{");
for (auto classification : clasifications) {
position_string +=
"{offset " + base::NumberToString(classification.offset) + ", style " +
base::NumberToString(classification.style) + "}, ";
}
position_string += "}\n";
return position_string;
}
} // namespace
TEST(TitledUrlMatchUtilsTest, TitledUrlMatchToAutocompleteMatch) {
......@@ -93,7 +105,10 @@ TEST(TitledUrlMatchUtilsTest, TitledUrlMatchToAutocompleteMatch) {
EXPECT_EQ(base::ASCIIToUTF16("google.com"), autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_EQ(match_title, autocomplete_match.description);
EXPECT_TRUE(std::equal(expected_description_class.begin(),
expected_description_class.end(),
......@@ -148,7 +163,10 @@ TEST(TitledUrlMatchUtilsTest, DoTrimHttpScheme) {
EXPECT_EQ(expected_contents, autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_TRUE(autocomplete_match.allowed_to_be_default_match);
}
......@@ -158,8 +176,7 @@ TEST(TitledUrlMatchUtilsTest, DontTrimHttpSchemeIfInputHasScheme) {
BuildTestAutocompleteMatch("http://face", match_url, {{11, 15}});
ACMatchClassifications expected_contents_class = {
{0, ACMatchClassification::URL},
{7, ACMatchClassification::URL | ACMatchClassification::MATCH},
{0, ACMatchClassification::URL | ACMatchClassification::MATCH},
{11, ACMatchClassification::URL},
};
base::string16 expected_contents(base::ASCIIToUTF16("http://facebook.com"));
......@@ -168,7 +185,10 @@ TEST(TitledUrlMatchUtilsTest, DontTrimHttpSchemeIfInputHasScheme) {
EXPECT_EQ(expected_contents, autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_FALSE(autocomplete_match.allowed_to_be_default_match);
}
......@@ -187,7 +207,10 @@ TEST(TitledUrlMatchUtilsTest, DoTrimHttpsScheme) {
EXPECT_EQ(expected_contents, autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_TRUE(autocomplete_match.allowed_to_be_default_match);
}
......@@ -197,8 +220,7 @@ TEST(TitledUrlMatchUtilsTest, DontTrimHttpsSchemeIfInputHasScheme) {
BuildTestAutocompleteMatch("https://face", match_url, {{12, 16}});
ACMatchClassifications expected_contents_class = {
{0, ACMatchClassification::URL},
{8, ACMatchClassification::URL | ACMatchClassification::MATCH},
{0, ACMatchClassification::URL | ACMatchClassification::MATCH},
{12, ACMatchClassification::URL},
};
base::string16 expected_contents(base::ASCIIToUTF16("https://facebook.com"));
......@@ -207,7 +229,10 @@ TEST(TitledUrlMatchUtilsTest, DontTrimHttpsSchemeIfInputHasScheme) {
EXPECT_EQ(expected_contents, autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_FALSE(autocomplete_match.allowed_to_be_default_match);
}
......@@ -253,7 +278,10 @@ TEST(TitledUrlMatchUtilsTest, EmptyInlineAutocompletion) {
EXPECT_EQ(base::ASCIIToUTF16("gmail.com"), autocomplete_match.contents);
EXPECT_TRUE(std::equal(expected_contents_class.begin(),
expected_contents_class.end(),
autocomplete_match.contents_class.begin()));
autocomplete_match.contents_class.begin()))
<< "EXPECTED: " << ACMatchClassificationsAsString(expected_contents_class)
<< "ACTUAL: "
<< ACMatchClassificationsAsString(autocomplete_match.contents_class);
EXPECT_EQ(match_title, autocomplete_match.description);
EXPECT_TRUE(std::equal(expected_description_class.begin(),
expected_description_class.end(),
......@@ -263,16 +291,3 @@ TEST(TitledUrlMatchUtilsTest, EmptyInlineAutocompletion) {
EXPECT_FALSE(autocomplete_match.allowed_to_be_default_match);
EXPECT_TRUE(autocomplete_match.inline_autocompletion.empty());
}
TEST(TitledUrlMatchUtilsTest, CorrectTitleAndMatchPositions) {
bookmarks::TitledUrlMatch::MatchPositions match_positions = {{2, 6},
{10, 15}};
base::string16 title = base::ASCIIToUTF16(" Leading whitespace");
bookmarks::TitledUrlMatch::MatchPositions expected_match_positions = {
{0, 4}, {8, 13}};
base::string16 expected_title = base::ASCIIToUTF16("Leading whitespace");
CorrectTitleAndMatchPositions(&title, &match_positions);
EXPECT_EQ(expected_title, title);
EXPECT_TRUE(std::equal(match_positions.begin(), match_positions.end(),
expected_match_positions.begin()));
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment