Several changes to speed up the ShortcutsProvider:

(1) Bail immediately when BEST_MATCH is requested (as is the case for the AutocompleteClassifier) as the ShortcutsProvider does not currently allow matches to score highly enough to be "best". (2) Check for input text that's a prefix of the shortcut in question, and mark the whole prefix as a match at once. This reduces the number of cases where we have both lots of input words AND a long unclassified chunk of output string (the case where things are slow). (3) Rewrite the general-case matching algorithm to be noticeably faster. (Not sure of the complexity analysis but I think the worst case is more like O(n*n) than the previous O(m*n*n) and the average case should be much faster than that. With the test profile and disabling the above two optimizations, typing a letter in my debug build resulted in a hang of about 2 seconds as opposed to several minutes.) We could probably still do better; in particular, one optimization we could make for all the providers would be to provide them a conservative estimate of how many total characters would be visible in the omnibox dropdown and then have them trim the contents and description fields accordingly before matching. However, a true conservative estimate -- which would assume that we had e.g. a string of 'i's -- would in the worst case still be several thousand characters wide, I don't want to do the plumbing work, and this optimization seems unnecessary at the moment anyway. BUG=138242 TEST=none Review URL: https://chromiumcodereview.appspot.com/10831004 git-svn-id: svn://svn.chromium.org/chrome/trunk/src@148444 0039d316-1c4b-4281-b951-d872f2087c98

Several changes to speed up the ShortcutsProvider:
(1) Bail immediately when BEST_MATCH is requested (as is the case for the AutocompleteClassifier) as the ShortcutsProvider does not currently allow matches to score highly enough to be "best". (2) Check for input text that's a prefix of the shortcut in question, and mark the whole prefix as a match at once. This reduces the number of cases where we have both lots of input words AND a long unclassified chunk of output string (the case where things are slow). (3) Rewrite the general-case matching algorithm to be noticeably faster. (Not sure of the complexity analysis but I think the worst case is more like O(n*n) than the previous O(m*n*n) and the average case should be much faster than that. With the test profile and disabling the above two optimizations, typing a letter in my debug build resulted in a hang of about 2 seconds as opposed to several minutes.) We could probably still do better; in particular, one optimization we could make for all the providers would be to provide them a conservative estimate of how many total characters would be visible in the omnibox dropdown and then have them trim the contents and description fields accordingly before matching. However, a true conservative estimate -- which would assume that we had e.g. a string of 'i's -- would in the worst case still be several thousand characters wide, I don't want to do the plumbing work, and this optimization seems unnecessary at the moment anyway. BUG=138242 TEST=none Review URL: https://chromiumcodereview.appspot.com/10831004 git-svn-id: svn://svn.chromium.org/chrome/trunk/src@148444 0039d316-1c4b-4281-b951-d872f2087c98
645b4485 · pkasting@chromium.org · 7660d568 · 645b4485 · 645b4485 · 645b4485
Commit 645b4485 authored Jul 25, 2012 by pkasting@chromium.org
3 changed files
--- a/chrome/browser/autocomplete/shortcuts_provider.cc
+++ b/chrome/browser/autocomplete/shortcuts_provider.cc
@@ -70,6 +70,10 @@ void ShortcutsProvider::Start(const AutocompleteInput& input,
      (input.type() == AutocompleteInput::FORCED_QUERY))
    return;
+  // None of our results are applicable for best match.
+  if (input.matches_requested() == AutocompleteInput::BEST_MATCH)
+    return;
  if (input.text().empty())
    return;
@@ -150,8 +154,12 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) {
  for (history::ShortcutsBackend::ShortcutMap::const_iterator it =
           FindFirstMatch(term_string, backend.get());
       it != backend->shortcuts_map().end() &&
-           StartsWith(it->first, term_string, true); ++it)
+           StartsWith(it->first, term_string, true); ++it) {
-    matches_.push_back(ShortcutToACMatch(input, term_string, it->second));
+    // Don't return shortcuts with zero relevance.
+    int relevance = CalculateScore(term_string, it->second);
+    if (relevance)
+      matches_.push_back(ShortcutToACMatch(relevance, term_string, it->second));
+  }
  std::partial_sort(matches_.begin(),
      matches_.begin() +
          std::min(AutocompleteProvider::kMaxMatches, matches_.size()),
@@ -163,91 +171,133 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) {
 }
 AutocompleteMatch ShortcutsProvider::ShortcutToACMatch(
-    const AutocompleteInput& input,
+    int relevance,
    const string16& term_string,
    const history::ShortcutsBackend::Shortcut& shortcut) {
-  AutocompleteMatch match(this, CalculateScore(term_string, shortcut),
+  DCHECK(!term_string.empty());
-                          true, AutocompleteMatch::HISTORY_TITLE);
+  AutocompleteMatch match(this, relevance, true,
+                          AutocompleteMatch::HISTORY_TITLE);
  match.destination_url = shortcut.url;
  DCHECK(match.destination_url.is_valid());
  match.fill_into_edit = UTF8ToUTF16(shortcut.url.spec());
  match.contents = shortcut.contents;
-  match.contents_class = ClassifyAllMatchesInString(term_string,
+  match.contents_class = shortcut.contents_class;
-                                                    match.contents,
-                                                    shortcut.contents_class);
  match.description = shortcut.description;
-  match.description_class = ClassifyAllMatchesInString(
+  match.description_class = shortcut.description_class;
-      term_string, match.description, shortcut.description_class);
+  // Try to mark pieces of the contents and description as matches if they
+  // appear in |term_string|.
+  WordMap terms_map(CreateWordMapForString(term_string));
+  if (!terms_map.empty()) {
+    match.contents_class = ClassifyAllMatchesInString(term_string, terms_map,
+        match.contents, match.contents_class);
+    match.description_class = ClassifyAllMatchesInString(term_string, terms_map,
+        match.description, match.description_class);
+  }
  return match;
 }
+// static
+ShortcutsProvider::WordMap ShortcutsProvider::CreateWordMapForString(
+    const string16& text) {
+  // First, convert |text| to a vector of the unique words in it.
+  WordMap word_map;
+  base::i18n::BreakIterator word_iter(text,
+                                      base::i18n::BreakIterator::BREAK_WORD);
+  if (!word_iter.Init())
+    return word_map;
+  std::vector<string16> words;
+  while (word_iter.Advance()) {
+    if (word_iter.IsWord())
+      words.push_back(word_iter.GetString());
+  }
+  if (words.empty())
+    return word_map;
+  std::sort(words.begin(), words.end());
+  words.erase(std::unique(words.begin(), words.end()), words.end());
+  // Now create a map from (first character) to (words beginning with that
+  // character).  We insert in reverse lexicographical order and rely on the
+  // multimap preserving insertion order for values with the same key.  (This
+  // is mandated in C++11, and part of that decision was based on a survey of
+  // existing implementations that found that it was already true everywhere.)
+  std::reverse(words.begin(), words.end());
+  for (std::vector<string16>::const_iterator i(words.begin()); i != words.end();
+       ++i)
+    word_map.insert(std::make_pair((*i)[0], *i));
+  return word_map;
+}
 // static
 ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString(
    const string16& find_text,
+    const WordMap& find_words,
    const string16& text,
    const ACMatchClassifications& original_class) {
  DCHECK(!find_text.empty());
+  DCHECK(!find_words.empty());
-  base::i18n::BreakIterator term_iter(find_text,
+  // First check whether |text| begins with |find_text| and mark that whole
-                                      base::i18n::BreakIterator::BREAK_WORD);
+  // section as a match if so.
-  if (!term_iter.Init())
-    return original_class;
-  std::vector<string16> terms;
-  while (term_iter.Advance()) {
-    if (term_iter.IsWord())
-      terms.push_back(term_iter.GetString());
-  }
-  // Sort the strings so that longer strings appear after their prefixes, if
-  // any.
-  std::sort(terms.begin(), terms.end());
-  // Find the earliest match of any word in |find_text| in the |text|. Add to
-  // |match_class|. Move pointer after match. Repeat until all matches are
-  // found.
  string16 text_lowercase(base::i18n::ToLower(text));
  ACMatchClassifications match_class;
-  // |match_class| should start at the position 0, if the matched text start
+  size_t last_position = 0;
-  // from the position 0, this will be popped from the vector further down.
+  if (StartsWith(text_lowercase, find_text, true)) {
-  match_class.push_back(ACMatchClassification(0, ACMatchClassification::NONE));
+    match_class.push_back(
-  for (size_t last_position = 0; last_position < text_lowercase.length();) {
+        ACMatchClassification(0, ACMatchClassification::MATCH));
-    size_t match_start = text_lowercase.length();
+    last_position = find_text.length();
-    size_t match_end = last_position;
+    // If |text_lowercase| is actually equal to |find_text|, we don't need to
+    // (and in fact shouldn't) put a trailing NONE classification after the end
-    for (size_t i = 0; i < terms.size(); ++i) {
+    // of the string.
-      size_t match = text_lowercase.find(terms[i], last_position);
+    if (last_position < text_lowercase.length()) {
-      // Use <= in conjunction with the sort() call above so that longer strings
+      match_class.push_back(
-      // are matched in preference to their prefixes.
+          ACMatchClassification(last_position, ACMatchClassification::NONE));
-      if (match != string16::npos && match <= match_start) {
-        match_start = match;
-        match_end = match + terms[i].length();
-      }
    }
+  } else {
+    // |match_class| should start at position 0.  If the first matching word is
+    // found at position 0, this will be popped from the vector further down.
+    match_class.push_back(
+        ACMatchClassification(0, ACMatchClassification::NONE));
+  }
-    if (match_start >= match_end)
+  // Now, starting with |last_position|, check each character in
-      break;
+  // |text_lowercase| to see if we have words starting with that character in
+  // |find_words|.  If so, check each of them to see if they match the portion
-    // Collapse adjacent ranges into one.
+  // of |text_lowercase| beginning with |last_position|.  Accept the first
-    if (!match_class.empty() && match_class.back().offset == match_start)
+  // matching word found (which should be the longest possible match at this
-      match_class.pop_back();
+  // location, given the construction of |find_words|) and add a MATCH region to
+  // |match_class|, moving |last_position| to be after the matching word.  If we
-    AutocompleteMatch::AddLastClassificationIfNecessary(&match_class,
+  // found no matching words, move to the next character and repeat.
-        match_start, ACMatchClassification::MATCH);
+  while (last_position < text_lowercase.length()) {
-    if (match_end < text_lowercase.length()) {
+    std::pair<WordMap::const_iterator, WordMap::const_iterator> range(
-      AutocompleteMatch::AddLastClassificationIfNecessary(&match_class,
+        find_words.equal_range(text_lowercase[last_position]));
-          match_end, ACMatchClassification::NONE);
+    size_t next_character = last_position + 1;
+    for (WordMap::const_iterator i(range.first); i != range.second; ++i) {
+      const string16& word = i->second;
+      size_t word_end = last_position + word.length();
+      if ((word_end <= text_lowercase.length()) &&
+          !text_lowercase.compare(last_position, word.length(), word)) {
+        // Collapse adjacent ranges into one.
+        if (match_class.back().offset == last_position)
+          match_class.pop_back();
+        AutocompleteMatch::AddLastClassificationIfNecessary(&match_class,
+            last_position, ACMatchClassification::MATCH);
+        if (word_end < text_lowercase.length()) {
+          match_class.push_back(
+              ACMatchClassification(word_end, ACMatchClassification::NONE));
+        }
+        last_position = word_end;
+        break;
+      }
    }
+    last_position = std::max(last_position, next_character);
-    last_position = match_end;
  }
  // Merge match-marking data with original classifications.
-  if (match_class.empty())
+  if ((match_class.size() == 1) &&
+      (match_class.back().style == ACMatchClassification::NONE))
    return original_class;
  ACMatchClassifications output;
  for (ACMatchClassifications::const_iterator i = original_class.begin(),
       j = match_class.begin(); i != original_class.end();) {
@@ -262,7 +312,6 @@ ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString(
    if (next_j_offset >= next_i_offset)
      ++i;
  }
  return output;
 }

--- a/chrome/browser/autocomplete/shortcuts_provider.h
+++ b/chrome/browser/autocomplete/shortcuts_provider.h
@@ -5,6 +5,7 @@
 #ifndef CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_
 #define CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_
+#include <map>
 #include <set>
 #include <string>
@@ -32,11 +33,13 @@ class ShortcutsProvider
  virtual void DeleteMatch(const AutocompleteMatch& match) OVERRIDE;
 private:
+  friend class ClassifyTest;
  friend class ShortcutsProviderTest;
-  FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, ClassifyAllMatchesInString);
  FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, CalculateScore);
  FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, DeleteMatch);
+  typedef std::multimap<char16, string16> WordMap;
  virtual ~ShortcutsProvider();
  // ShortcutsBackendObserver:
@@ -49,23 +52,39 @@ class ShortcutsProvider
  void GetMatches(const AutocompleteInput& input);
  AutocompleteMatch ShortcutToACMatch(
-      const AutocompleteInput& input,
+      int relevance,
      const string16& terms,
      const history::ShortcutsBackend::Shortcut& shortcut);
+  // Returns a map mapping characters to groups of words from |text| that start
+  // with those characters, ordered lexicographically descending so that longer
+  // words appear before their prefixes (if any) within a particular
+  // equal_range().
+  static WordMap CreateWordMapForString(const string16& text);
  // Given |text| and a corresponding base set of classifications
  // |original_class|, adds ACMatchClassification::MATCH markers for all
-  // instances of the words from |find_text| within |text| and returns the
+  // instances of the words from |find_words| within |text| and returns the
-  // resulting classifications.
+  // resulting classifications.  (|find_text| is provided as the original string
+  // used to create |find_words|.  This is supplied because it's common for this
+  // to be a prefix of |text|, so we can quickly check for that and mark that
+  // entire substring as a match before proceeding with the more generic
+  // algorithm.)
  //
  // For example, given the |text|
  // "Sports and News at sports.somesite.com - visit us!" and |original_class|
  // {{0, NONE}, {18, URL}, {37, NONE}} (marking "sports.somesite.com" as a
  // URL), calling with |find_text| set to "sp ew" would return
  // {{0, MATCH}, {2, NONE}, {12, MATCH}, {14, NONE}, {18, URL|MATCH},
-  // {20, URL}, {37, NONE}}
+  // {20, URL}, {37, NONE}}.
+  //
+  // |find_words| should be as constructed by CreateWordMapForString(find_text).
+  //
+  // |find_text| (and thus |find_words|) are expected to be lowercase.  |text|
+  // will be lowercased in this function.
  static ACMatchClassifications ClassifyAllMatchesInString(
      const string16& find_text,
+      const WordMap& find_words,
      const string16& text,
      const ACMatchClassifications& original_class);

--- a/chrome/browser/autocomplete/shortcuts_provider_unittest.cc
+++ b/chrome/browser/autocomplete/shortcuts_provider_unittest.cc