Changed the app list string matching formula.

This is used to rank apps and webstore results. This should not (really) affect the relative ranking of any results, only the absolute scores that they are assigned internally. However, this will be relevant in the future when we start comparing scores of different types of results against each other. The algorithm (used to score app and webstore results) now has a different tapering formula, designed to reach a higher score with fewer keystrokes. Previously, it was based on the percentage of the full title you had typed (which unfairly de-prioritized apps with long titles, such as "Google Keep - notes and lists"). Now, it has an exponential curve, so you get a reasonably high score with just a few letters matched, and then it tapers off, approaching 1.0 as you type more letters. BUG=422610 Review URL: https://codereview.chromium.org/1138193002 Cr-Commit-Position: refs/heads/master@{#329574}

Changed the app list string matching formula.
This is used to rank apps and webstore results. This should not (really) affect the relative ranking of any results, only the absolute scores that they are assigned internally. However, this will be relevant in the future when we start comparing scores of different types of results against each other. The algorithm (used to score app and webstore results) now has a different tapering formula, designed to reach a higher score with fewer keystrokes. Previously, it was based on the percentage of the full title you had typed (which unfairly de-prioritized apps with long titles, such as "Google Keep - notes and lists"). Now, it has an exponential curve, so you get a reasonably high score with just a few letters matched, and then it tapers off, approaching 1.0 as you type more letters. BUG=422610 Review URL: https://codereview.chromium.org/1138193002 Cr-Commit-Position: refs/heads/master@{#329574}
4eb24b29 · mgiuca · Commit bot · 64067bf2 · 4eb24b29 · 4eb24b29
Commit 4eb24b29 authored May 12, 2015 by mgiuca Committed by Commit bot May 13, 2015
Showing with 42 additions and 5 deletions

ui/app_list/search/tokenized_string_match.cc ui/app_list/search/tokenized_string_match.cc +10 -4

ui/app_list/search/tokenized_string_match_unittest.cc ui/app_list/search/tokenized_string_match_unittest.cc +32 -1

No files found.
--- a/ui/app_list/search/tokenized_string_match.cc
+++ b/ui/app_list/search/tokenized_string_match.cc
@@ -4,6 +4,8 @@
 #include "ui/app_list/search/tokenized_string_match.h"
+#include <cmath>
 #include "base/i18n/string_search.h"
 #include "base/logging.h"
 #include "ui/app_list/search/tokenized_string_char_iterator.h"
@@ -218,10 +220,14 @@ bool TokenizedStringMatch::Calculate(const TokenizedString& query,
    }
  }
-  // Using length() for normalizing is not 100% correct but should be good
+  // Temper the relevance score with an exponential curve. Each point of
-  // enough compared with using real char count of the text.
+  // relevance (roughly, each keystroke) is worth less than the last. This means
-  if (text.text().length())
+  // that typing a few characters of a word is enough to promote matches very
-    relevance_ /= text.text().length();
+  // high, with any subsequent characters being worth comparatively less.
+  // TODO(mgiuca): This doesn't really play well with Omnibox results, since as
+  // you type more characters, the app/omnibox results tend to jump over each
+  // other.
+  relevance_ = 1.0 - std::pow(0.5, relevance_);
  return relevance_ > kNoMatchScore;
 }

--- a/ui/app_list/search/tokenized_string_match_unittest.cc
+++ b/ui/app_list/search/tokenized_string_match_unittest.cc
@@ -14,7 +14,7 @@ namespace app_list {
 namespace test {
 // Returns a string of |text| marked the hits in |match| using block bracket.
-// e.g. text= "Text", hits = [{0,1}], returns "[T]ext".
+// e.g. text= "Text", match.hits = [{0,1}], returns "[T]ext".
 std::string MatchHit(const base::string16& text,
                     const TokenizedStringMatch& match) {
  base::string16 marked = text;
@@ -119,5 +119,36 @@ TEST(TokenizedStringMatchTest, Relevance) {
  }
 }
+// More specialized tests of the absolute relevance scores. (These tests are
+// minimal, because they are so brittle. Changing the scoring algorithm will
+// require updating this test.)
+TEST(TokenizedStringMatchTest, AbsoluteRelevance) {
+  const double kEpsilon = 0.006;
+  struct {
+    const char* text;
+    const char* query;
+    double expected_score;
+  } kTestCases[] = {
+      // The first few chars should increase the score extremely high. After
+      // that, they should count less.
+      // NOTE: 0.87 is a magic number, as it is the Omnibox score for a "pretty
+      // good" match. We want a 3-letter prefix match to be slightly above 0.87.
+      {"Google Chrome", "g", 0.5},
+      {"Google Chrome", "go", 0.75},
+      {"Google Chrome", "goo", 0.88},
+      {"Google Chrome", "goog", 0.94},
+  };
+  TokenizedStringMatch match;
+  for (size_t i = 0; i < arraysize(kTestCases); ++i) {
+    const base::string16 text(base::UTF8ToUTF16(kTestCases[i].text));
+    EXPECT_TRUE(match.Calculate(base::UTF8ToUTF16(kTestCases[i].query), text));
+    EXPECT_NEAR(match.relevance(), kTestCases[i].expected_score, kEpsilon)
+        << "Test case " << i << " : text=" << kTestCases[i].text
+        << ", query=" << kTestCases[i].query
+        << ", expected_score=" << kTestCases[i].expected_score;
+  }
+}
 }  // namespace test
 }  // namespace app_list