[omnibox]: Dedupe historyQuick provider classification.

This is the 16th refactoring CL aimed at reducing duplication and inconsistency for classifying omnibox results. The historyQuick provider compares the user input and the suggestion text to find the corresponding matches during construction of the ScoredHistoryMatch precursor to the suggestion's AutocompleteMatch. The term matches are then used for both scoring and classifying. With this CL, historyQuick classification uses the 'FindTermMatches' and 'ClassifyTermMatches' methods that other providers use. This improves consistency with other providers at the cost of inconsistency with the historyQuick provider's scoring. The differences between current and this CL's classification: 1) Prefix matching. With this CL, if the input is an exact prefix of the suggestion text, subsequent matching words in the suggestion text are not bolded. E.g. for input 'x' and suggestion 'x x', before this CL, both occurrences would be bolded; with this CL, only the first will be. 2) Midword matching. Before this CL, midword matches were allowed for the URL host domain. E.g., for input 'x' and suggestion 'zxx.xx/xx', before this CL, the first 5 occurrences would be bolded, 'z[xx].[xx]/[x]x'; with this CL, only the 3rd and 4th will be bolded, 'zxx.[x]x/[x]x'. 3) Input word-break separators. Before this CL, the user input was not broken by symbols, though the suggest text was. E.g., for input 'x%x y' and suggestion 'x%x x y%y y', before this CL, it would bold as '[x%x] x [y]%[y] [y]'; with this CL, it will bold as '[x]%[x] [x] [y]%[y] [y]'. All 3 changes apply to suggestion classification (bolding) only; determining which suggestions to display and their ordering is unaffected. Re consistency with other providers: A user may see suggestions from different providers with similar texts. E.g., the user input 'the' could provide both search and historyQuick suggestions with texts 'the cake ate the moon'. It would be surprising if such suggestions with the same text were bolded differently; e.g. the search suggestion were bolded '[the] cake ate the moon', whereas the historyQuick suggestion was bolded '[the] cake ate [the] moon'. Re inconsistency with historyQuick's scoring. Scoring is consistent with the previous bolding; all previously bolded terms would either contribute (positively or negatively) to the suggestions score or disqualify the suggestion (e.g. 'm' disqualifies 'yahoo.com'). E.g., 'y hoo' would bold both terms in the suggestion URL '[y]a[hoo].com', and both would contribute to the score. With the classification changes, only 'y' would be bolded, suggesting the 'hoo' term did not contribute to scoring. Bug: 366623 Change-Id: I478a9d4fcc63abe7aa55dba4274e8896d6bdc388 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1593717 Commit-Queue: manuk hovanesian <manukh@chromium.org> Reviewed-by: Tommy Li <tommycli@chromium.org> Cr-Commit-Position: refs/heads/master@{#664505}

[omnibox]: Dedupe historyQuick provider classification.
This is the 16th refactoring CL aimed at reducing duplication and inconsistency for classifying omnibox results. The historyQuick provider compares the user input and the suggestion text to find the corresponding matches during construction of the ScoredHistoryMatch precursor to the suggestion's AutocompleteMatch. The term matches are then used for both scoring and classifying. With this CL, historyQuick classification uses the 'FindTermMatches' and 'ClassifyTermMatches' methods that other providers use. This improves consistency with other providers at the cost of inconsistency with the historyQuick provider's scoring. The differences between current and this CL's classification: 1) Prefix matching. With this CL, if the input is an exact prefix of the suggestion text, subsequent matching words in the suggestion text are not bolded. E.g. for input 'x' and suggestion 'x x', before this CL, both occurrences would be bolded; with this CL, only the first will be. 2) Midword matching. Before this CL, midword matches were allowed for the URL host domain. E.g., for input 'x' and suggestion 'zxx.xx/xx', before this CL, the first 5 occurrences would be bolded, 'z[xx].[xx]/[x]x'; with this CL, only the 3rd and 4th will be bolded, 'zxx.[x]x/[x]x'. 3) Input word-break separators. Before this CL, the user input was not broken by symbols, though the suggest text was. E.g., for input 'x%x y' and suggestion 'x%x x y%y y', before this CL, it would bold as '[x%x] x [y]%[y] [y]'; with this CL, it will bold as '[x]%[x] [x] [y]%[y] [y]'. All 3 changes apply to suggestion classification (bolding) only; determining which suggestions to display and their ordering is unaffected. Re consistency with other providers: A user may see suggestions from different providers with similar texts. E.g., the user input 'the' could provide both search and historyQuick suggestions with texts 'the cake ate the moon'. It would be surprising if such suggestions with the same text were bolded differently; e.g. the search suggestion were bolded '[the] cake ate the moon', whereas the historyQuick suggestion was bolded '[the] cake ate [the] moon'. Re inconsistency with historyQuick's scoring. Scoring is consistent with the previous bolding; all previously bolded terms would either contribute (positively or negatively) to the suggestions score or disqualify the suggestion (e.g. 'm' disqualifies 'yahoo.com'). E.g., 'y hoo' would bold both terms in the suggestion URL '[y]a[hoo].com', and both would contribute to the score. With the classification changes, only 'y' would be bolded, suggesting the 'hoo' term did not contribute to scoring. Bug: 366623 Change-Id: I478a9d4fcc63abe7aa55dba4274e8896d6bdc388 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1593717 Commit-Queue: manuk hovanesian <manukh@chromium.org> Reviewed-by: Tommy Li <tommycli@chromium.org> Cr-Commit-Position: refs/heads/master@{#664505}
37414e4b · manuk · Commit Bot · 28f9927e · 37414e4b
Commit 37414e4b authored May 29, 2019 by manuk Committed by Commit Bot May 29, 2019
Show whitespace changes
Inline Side-by-side

Showing with 32 additions and 15 deletions

components/omnibox/browser/history_quick_provider.cc components/omnibox/browser/history_quick_provider.cc +32 -15

No files found.
--- a/components/omnibox/browser/history_quick_provider.cc
+++ b/components/omnibox/browser/history_quick_provider.cc
@@ -20,12 +20,12 @@
 #include "components/bookmarks/browser/bookmark_model.h"
 #include "components/history/core/browser/history_database.h"
 #include "components/history/core/browser/history_service.h"
+#include "components/omnibox/browser/autocomplete_match_classification.h"
 #include "components/omnibox/browser/autocomplete_match_type.h"
 #include "components/omnibox/browser/autocomplete_provider_client.h"
 #include "components/omnibox/browser/autocomplete_result.h"
 #include "components/omnibox/browser/history_url_provider.h"
 #include "components/omnibox/browser/in_memory_url_index.h"
-#include "components/omnibox/browser/in_memory_url_index_types.h"
 #include "components/omnibox/browser/omnibox_field_trial.h"
 #include "components/omnibox/browser/url_prefix.h"
 #include "components/prefs/pref_service.h"
@@ -237,27 +237,44 @@ AutocompleteMatch HistoryQuickProvider::QuickMatchToACMatch(
        !PreventInlineAutocomplete(autocomplete_input_);
  }

-  // The term match offsets should be adjusted based on the formatting
-  // applied to the suggestion contents displayed in the dropdown.
-  std::vector<size_t> offsets =
-      OffsetsFromTermMatches(history_match.url_matches);
-  match.contents = url_formatter::FormatUrlWithOffsets(
+  // HistoryQuick classification diverges from relevance scoring. Specifically,
+  // 1) All occurrences of the input contribute to relevance; e.g. for the input
+  // 'pre', the suggestion 'pre prefix' will be scored higher than 'pre suffix'.
+  // For classification though, if the input is a prefix of the suggestion text,
+  // only the prefix will be bolded; e.g. the 1st suggestion will display '[pre]
+  // prefix' as opposed to '[pre] [pre]fix'. This divergence allows consistency
+  // with other providers' and google.com's bolding.
+  // 2) Mid-word occurrences of the input within the suggestion URL contribute
+  // to relevance; e.g. for the input 'mail', the suggestion 'mail - gmail.com'
+  // will be scored higher than 'mail - outlook.live.com'. Mid-word matches only
+  // in the domain affect scoring. For classification though, mid-word matches
+  // are not bolded; e.g. the 1st suggestion will display '[mail] - gmail.com'.
+  // 3) User input is not broken on symbols for relevance calculations; e.g. for
+  // the input '#yolo', the suggestion 'how-to-yolo - yolo.com/#yolo' would be
+  // scored the same as 'how-to-tie-a-tie - yolo.com/#yolo/tie'. For
+  // classification though, user input is broken on symbols; e.g. the 1st
+  // suggestion will display 'how-to-[yolo] - [yolo].com/#[yolo]'.
+
+  match.contents = url_formatter::FormatUrl(
      info.url(),
      AutocompleteMatch::GetFormatTypes(
          autocomplete_input_.parts().scheme.len > 0 ||
              history_match.match_in_scheme,
          history_match.match_in_subdomain),
-      net::UnescapeRule::SPACES, nullptr, nullptr, &offsets);
-
-  TermMatches new_matches =
-      ReplaceOffsetsInTermMatches(history_match.url_matches, offsets);
-  match.contents_class =
-      SpansFromTermMatch(new_matches, match.contents.length(), true);
+      net::UnescapeRule::SPACES, nullptr, nullptr, nullptr);
+  auto contents_terms =
+      FindTermMatches(autocomplete_input_.text(), match.contents);
+  match.contents_class = ClassifyTermMatches(
+      contents_terms, match.contents.size(),
+      ACMatchClassification::MATCH | ACMatchClassification::URL,
+      ACMatchClassification::URL);

-  // Format the description autocomplete presentation.
  match.description = info.title();
-  match.description_class = SpansFromTermMatch(
-      history_match.title_matches, match.description.length(), false);
+  auto description_terms =
+      FindTermMatches(autocomplete_input_.text(), match.description);
+  match.description_class = ClassifyTermMatches(
+      description_terms, match.description.size(), ACMatchClassification::MATCH,
+      ACMatchClassification::NONE);

  match.RecordAdditionalInfo("typed count", info.typed_count());
  match.RecordAdditionalInfo("visit count", info.visit_count());