Commit 645b4485 authored by pkasting@chromium.org's avatar pkasting@chromium.org

Several changes to speed up the ShortcutsProvider:

(1) Bail immediately when BEST_MATCH is requested (as is the case for the AutocompleteClassifier) as the ShortcutsProvider does not currently allow matches to score highly enough to be "best".
(2) Check for input text that's a prefix of the shortcut in question, and mark the whole prefix as a match at once.  This reduces the number of cases where we have both lots of input words AND a long unclassified chunk of output string (the case where things are slow).
(3) Rewrite the general-case matching algorithm to be noticeably faster.  (Not sure of the complexity analysis but I think the worst case is more like O(n*n) than the previous O(m*n*n) and the average case should be much faster than that.  With the test profile and disabling the above two optimizations, typing a letter in my debug build resulted in a hang of about 2 seconds as opposed to several minutes.)

We could probably still do better; in particular, one optimization we could make for all the providers would be to provide them a conservative estimate of how many total characters would be visible in the omnibox dropdown and then have them trim the contents and description fields accordingly before matching.  However, a true conservative estimate -- which would assume that we had e.g. a string of 'i's -- would in the worst case still be several thousand characters wide, I don't want to do the plumbing work, and this optimization seems unnecessary at the moment anyway.

BUG=138242
TEST=none
Review URL: https://chromiumcodereview.appspot.com/10831004

git-svn-id: svn://svn.chromium.org/chrome/trunk/src@148444 0039d316-1c4b-4281-b951-d872f2087c98
parent 7660d568
...@@ -70,6 +70,10 @@ void ShortcutsProvider::Start(const AutocompleteInput& input, ...@@ -70,6 +70,10 @@ void ShortcutsProvider::Start(const AutocompleteInput& input,
(input.type() == AutocompleteInput::FORCED_QUERY)) (input.type() == AutocompleteInput::FORCED_QUERY))
return; return;
// None of our results are applicable for best match.
if (input.matches_requested() == AutocompleteInput::BEST_MATCH)
return;
if (input.text().empty()) if (input.text().empty())
return; return;
...@@ -150,8 +154,12 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) { ...@@ -150,8 +154,12 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) {
for (history::ShortcutsBackend::ShortcutMap::const_iterator it = for (history::ShortcutsBackend::ShortcutMap::const_iterator it =
FindFirstMatch(term_string, backend.get()); FindFirstMatch(term_string, backend.get());
it != backend->shortcuts_map().end() && it != backend->shortcuts_map().end() &&
StartsWith(it->first, term_string, true); ++it) StartsWith(it->first, term_string, true); ++it) {
matches_.push_back(ShortcutToACMatch(input, term_string, it->second)); // Don't return shortcuts with zero relevance.
int relevance = CalculateScore(term_string, it->second);
if (relevance)
matches_.push_back(ShortcutToACMatch(relevance, term_string, it->second));
}
std::partial_sort(matches_.begin(), std::partial_sort(matches_.begin(),
matches_.begin() + matches_.begin() +
std::min(AutocompleteProvider::kMaxMatches, matches_.size()), std::min(AutocompleteProvider::kMaxMatches, matches_.size()),
...@@ -163,91 +171,133 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) { ...@@ -163,91 +171,133 @@ void ShortcutsProvider::GetMatches(const AutocompleteInput& input) {
} }
AutocompleteMatch ShortcutsProvider::ShortcutToACMatch( AutocompleteMatch ShortcutsProvider::ShortcutToACMatch(
const AutocompleteInput& input, int relevance,
const string16& term_string, const string16& term_string,
const history::ShortcutsBackend::Shortcut& shortcut) { const history::ShortcutsBackend::Shortcut& shortcut) {
AutocompleteMatch match(this, CalculateScore(term_string, shortcut), DCHECK(!term_string.empty());
true, AutocompleteMatch::HISTORY_TITLE); AutocompleteMatch match(this, relevance, true,
AutocompleteMatch::HISTORY_TITLE);
match.destination_url = shortcut.url; match.destination_url = shortcut.url;
DCHECK(match.destination_url.is_valid()); DCHECK(match.destination_url.is_valid());
match.fill_into_edit = UTF8ToUTF16(shortcut.url.spec()); match.fill_into_edit = UTF8ToUTF16(shortcut.url.spec());
match.contents = shortcut.contents; match.contents = shortcut.contents;
match.contents_class = ClassifyAllMatchesInString(term_string, match.contents_class = shortcut.contents_class;
match.contents,
shortcut.contents_class);
match.description = shortcut.description; match.description = shortcut.description;
match.description_class = ClassifyAllMatchesInString( match.description_class = shortcut.description_class;
term_string, match.description, shortcut.description_class);
// Try to mark pieces of the contents and description as matches if they
// appear in |term_string|.
WordMap terms_map(CreateWordMapForString(term_string));
if (!terms_map.empty()) {
match.contents_class = ClassifyAllMatchesInString(term_string, terms_map,
match.contents, match.contents_class);
match.description_class = ClassifyAllMatchesInString(term_string, terms_map,
match.description, match.description_class);
}
return match; return match;
} }
// static
ShortcutsProvider::WordMap ShortcutsProvider::CreateWordMapForString(
const string16& text) {
// First, convert |text| to a vector of the unique words in it.
WordMap word_map;
base::i18n::BreakIterator word_iter(text,
base::i18n::BreakIterator::BREAK_WORD);
if (!word_iter.Init())
return word_map;
std::vector<string16> words;
while (word_iter.Advance()) {
if (word_iter.IsWord())
words.push_back(word_iter.GetString());
}
if (words.empty())
return word_map;
std::sort(words.begin(), words.end());
words.erase(std::unique(words.begin(), words.end()), words.end());
// Now create a map from (first character) to (words beginning with that
// character). We insert in reverse lexicographical order and rely on the
// multimap preserving insertion order for values with the same key. (This
// is mandated in C++11, and part of that decision was based on a survey of
// existing implementations that found that it was already true everywhere.)
std::reverse(words.begin(), words.end());
for (std::vector<string16>::const_iterator i(words.begin()); i != words.end();
++i)
word_map.insert(std::make_pair((*i)[0], *i));
return word_map;
}
// static // static
ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString( ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString(
const string16& find_text, const string16& find_text,
const WordMap& find_words,
const string16& text, const string16& text,
const ACMatchClassifications& original_class) { const ACMatchClassifications& original_class) {
DCHECK(!find_text.empty()); DCHECK(!find_text.empty());
DCHECK(!find_words.empty());
base::i18n::BreakIterator term_iter(find_text, // First check whether |text| begins with |find_text| and mark that whole
base::i18n::BreakIterator::BREAK_WORD); // section as a match if so.
if (!term_iter.Init())
return original_class;
std::vector<string16> terms;
while (term_iter.Advance()) {
if (term_iter.IsWord())
terms.push_back(term_iter.GetString());
}
// Sort the strings so that longer strings appear after their prefixes, if
// any.
std::sort(terms.begin(), terms.end());
// Find the earliest match of any word in |find_text| in the |text|. Add to
// |match_class|. Move pointer after match. Repeat until all matches are
// found.
string16 text_lowercase(base::i18n::ToLower(text)); string16 text_lowercase(base::i18n::ToLower(text));
ACMatchClassifications match_class; ACMatchClassifications match_class;
// |match_class| should start at the position 0, if the matched text start size_t last_position = 0;
// from the position 0, this will be popped from the vector further down. if (StartsWith(text_lowercase, find_text, true)) {
match_class.push_back(ACMatchClassification(0, ACMatchClassification::NONE)); match_class.push_back(
for (size_t last_position = 0; last_position < text_lowercase.length();) { ACMatchClassification(0, ACMatchClassification::MATCH));
size_t match_start = text_lowercase.length(); last_position = find_text.length();
size_t match_end = last_position; // If |text_lowercase| is actually equal to |find_text|, we don't need to
// (and in fact shouldn't) put a trailing NONE classification after the end
for (size_t i = 0; i < terms.size(); ++i) { // of the string.
size_t match = text_lowercase.find(terms[i], last_position); if (last_position < text_lowercase.length()) {
// Use <= in conjunction with the sort() call above so that longer strings match_class.push_back(
// are matched in preference to their prefixes. ACMatchClassification(last_position, ACMatchClassification::NONE));
if (match != string16::npos && match <= match_start) {
match_start = match;
match_end = match + terms[i].length();
}
} }
} else {
// |match_class| should start at position 0. If the first matching word is
// found at position 0, this will be popped from the vector further down.
match_class.push_back(
ACMatchClassification(0, ACMatchClassification::NONE));
}
if (match_start >= match_end) // Now, starting with |last_position|, check each character in
break; // |text_lowercase| to see if we have words starting with that character in
// |find_words|. If so, check each of them to see if they match the portion
// Collapse adjacent ranges into one. // of |text_lowercase| beginning with |last_position|. Accept the first
if (!match_class.empty() && match_class.back().offset == match_start) // matching word found (which should be the longest possible match at this
match_class.pop_back(); // location, given the construction of |find_words|) and add a MATCH region to
// |match_class|, moving |last_position| to be after the matching word. If we
AutocompleteMatch::AddLastClassificationIfNecessary(&match_class, // found no matching words, move to the next character and repeat.
match_start, ACMatchClassification::MATCH); while (last_position < text_lowercase.length()) {
if (match_end < text_lowercase.length()) { std::pair<WordMap::const_iterator, WordMap::const_iterator> range(
AutocompleteMatch::AddLastClassificationIfNecessary(&match_class, find_words.equal_range(text_lowercase[last_position]));
match_end, ACMatchClassification::NONE); size_t next_character = last_position + 1;
for (WordMap::const_iterator i(range.first); i != range.second; ++i) {
const string16& word = i->second;
size_t word_end = last_position + word.length();
if ((word_end <= text_lowercase.length()) &&
!text_lowercase.compare(last_position, word.length(), word)) {
// Collapse adjacent ranges into one.
if (match_class.back().offset == last_position)
match_class.pop_back();
AutocompleteMatch::AddLastClassificationIfNecessary(&match_class,
last_position, ACMatchClassification::MATCH);
if (word_end < text_lowercase.length()) {
match_class.push_back(
ACMatchClassification(word_end, ACMatchClassification::NONE));
}
last_position = word_end;
break;
}
} }
last_position = std::max(last_position, next_character);
last_position = match_end;
} }
// Merge match-marking data with original classifications. // Merge match-marking data with original classifications.
if (match_class.empty()) if ((match_class.size() == 1) &&
(match_class.back().style == ACMatchClassification::NONE))
return original_class; return original_class;
ACMatchClassifications output; ACMatchClassifications output;
for (ACMatchClassifications::const_iterator i = original_class.begin(), for (ACMatchClassifications::const_iterator i = original_class.begin(),
j = match_class.begin(); i != original_class.end();) { j = match_class.begin(); i != original_class.end();) {
...@@ -262,7 +312,6 @@ ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString( ...@@ -262,7 +312,6 @@ ACMatchClassifications ShortcutsProvider::ClassifyAllMatchesInString(
if (next_j_offset >= next_i_offset) if (next_j_offset >= next_i_offset)
++i; ++i;
} }
return output; return output;
} }
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
#ifndef CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_ #ifndef CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_
#define CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_ #define CHROME_BROWSER_AUTOCOMPLETE_SHORTCUTS_PROVIDER_H_
#include <map>
#include <set> #include <set>
#include <string> #include <string>
...@@ -32,11 +33,13 @@ class ShortcutsProvider ...@@ -32,11 +33,13 @@ class ShortcutsProvider
virtual void DeleteMatch(const AutocompleteMatch& match) OVERRIDE; virtual void DeleteMatch(const AutocompleteMatch& match) OVERRIDE;
private: private:
friend class ClassifyTest;
friend class ShortcutsProviderTest; friend class ShortcutsProviderTest;
FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, ClassifyAllMatchesInString);
FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, CalculateScore); FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, CalculateScore);
FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, DeleteMatch); FRIEND_TEST_ALL_PREFIXES(ShortcutsProviderTest, DeleteMatch);
typedef std::multimap<char16, string16> WordMap;
virtual ~ShortcutsProvider(); virtual ~ShortcutsProvider();
// ShortcutsBackendObserver: // ShortcutsBackendObserver:
...@@ -49,23 +52,39 @@ class ShortcutsProvider ...@@ -49,23 +52,39 @@ class ShortcutsProvider
void GetMatches(const AutocompleteInput& input); void GetMatches(const AutocompleteInput& input);
AutocompleteMatch ShortcutToACMatch( AutocompleteMatch ShortcutToACMatch(
const AutocompleteInput& input, int relevance,
const string16& terms, const string16& terms,
const history::ShortcutsBackend::Shortcut& shortcut); const history::ShortcutsBackend::Shortcut& shortcut);
// Returns a map mapping characters to groups of words from |text| that start
// with those characters, ordered lexicographically descending so that longer
// words appear before their prefixes (if any) within a particular
// equal_range().
static WordMap CreateWordMapForString(const string16& text);
// Given |text| and a corresponding base set of classifications // Given |text| and a corresponding base set of classifications
// |original_class|, adds ACMatchClassification::MATCH markers for all // |original_class|, adds ACMatchClassification::MATCH markers for all
// instances of the words from |find_text| within |text| and returns the // instances of the words from |find_words| within |text| and returns the
// resulting classifications. // resulting classifications. (|find_text| is provided as the original string
// used to create |find_words|. This is supplied because it's common for this
// to be a prefix of |text|, so we can quickly check for that and mark that
// entire substring as a match before proceeding with the more generic
// algorithm.)
// //
// For example, given the |text| // For example, given the |text|
// "Sports and News at sports.somesite.com - visit us!" and |original_class| // "Sports and News at sports.somesite.com - visit us!" and |original_class|
// {{0, NONE}, {18, URL}, {37, NONE}} (marking "sports.somesite.com" as a // {{0, NONE}, {18, URL}, {37, NONE}} (marking "sports.somesite.com" as a
// URL), calling with |find_text| set to "sp ew" would return // URL), calling with |find_text| set to "sp ew" would return
// {{0, MATCH}, {2, NONE}, {12, MATCH}, {14, NONE}, {18, URL|MATCH}, // {{0, MATCH}, {2, NONE}, {12, MATCH}, {14, NONE}, {18, URL|MATCH},
// {20, URL}, {37, NONE}} // {20, URL}, {37, NONE}}.
//
// |find_words| should be as constructed by CreateWordMapForString(find_text).
//
// |find_text| (and thus |find_words|) are expected to be lowercase. |text|
// will be lowercased in this function.
static ACMatchClassifications ClassifyAllMatchesInString( static ACMatchClassifications ClassifyAllMatchesInString(
const string16& find_text, const string16& find_text,
const WordMap& find_words,
const string16& text, const string16& text,
const ACMatchClassifications& original_class); const ACMatchClassifications& original_class);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment