Commit dc9422b1 authored by Jia's avatar Jia Committed by Commit Bot

[local-search-service] Disable partial match for token set ratio

TokenSetRatio calculates pairwise string match ratios between the
following three and returns the max as the final result.
(i). intersection string (of query & text)
(ii). query rewritten (intersection + query_diff_text)
(iii). text rewritten (intersection + text_diff_query)

If we enable partial match, then (i) and (ii) or (i) and (iii)
will return extremely high ratios (close to 1) if intersection
is non-empty. This means if there is any common word between
query and text, the result ratio will be too high.

This cl disables partial match in TokenSetRatio.

Bug: 1081584
Change-Id: If4062c367f74be62a733d6b0e3d54353bfba1365
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2206057Reviewed-by: default avatarThanh Nguyen <thanhdng@chromium.org>
Commit-Queue: Jia Meng <jiameng@chromium.org>
Cr-Commit-Position: refs/heads/master@{#769626}
parent 8f780000
......@@ -253,9 +253,13 @@ double FuzzyTokenizedStringMatch::WeightedRatio(
TokenSortRatio(query, text, use_partial /*partial*/,
partial_match_penalty_rate, use_edit_distance) *
unbase_scale * partial_scale);
// Do not use partial match for token set because the match between the
// intersection string and query/text rewrites will always return an extremely
// high value.
weighted_ratio =
std::max(weighted_ratio,
TokenSetRatio(query, text, use_partial /*partial*/,
TokenSetRatio(query, text, false /*partial*/,
partial_match_penalty_rate, use_edit_distance) *
unbase_scale * partial_scale);
return weighted_ratio;
......
......@@ -161,7 +161,7 @@ TEST_F(FuzzyTokenizedStringMatchTest, WeightedRatio) {
EXPECT_NEAR(
match.WeightedRatio(TokenizedString(query), TokenizedString(text),
kPartialMatchPenaltyRate, false),
0.85, 0.01);
0.49, 0.01);
}
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment