Commit 0fe0c4ea authored by John Delaney's avatar John Delaney Committed by Commit Bot

Combine filter list rules which only differ by domain during generation

Currently, whitelist rules in the easylist are not always deduplicated
by the rule. It is possible for there to be two separate lines with the
same rule but different domains.

This CL adds an awk command during filter list generation which combines
these lines.

This shrinks the filter list size by 1.7% and reduces the number of
overall rules by 4.7% on the current list.

Bug: 1048224
Change-Id: Ib3e71e9b371605380cfc5600fe8a92f5898f5b6d
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2033987
Commit-Queue: John Delaney <johnidel@chromium.org>
Reviewed-by: default avatarJosh Karlin <jkarlin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#737886}
parent 5edf3130
...@@ -88,9 +88,11 @@ An example using [EasyList](https://easylist.to/easylist/easylist.txt) follows: ...@@ -88,9 +88,11 @@ An example using [EasyList](https://easylist.to/easylist/easylist.txt) follows:
``` ```
## 4. Append all of the whitelist rules to be safe ## 4. Append all of the whitelist rules to be safe
Appends whitelist rules and also deduplicates rules which only differ by their set of affected domains.
```sh ```sh
1. grep ^@@ easylist.txt >> smaller_list.txt 1. grep ^@@ easylist.txt >> smaller_list.txt
2. sort smaller_list.txt | uniq > final_list.txt 2. awk -F,domain= '{ if(!length($2)) table[$1] = ""; else table[$1 FS] = length(table[$1 FS]) ? table[$1 FS] "|" $2 : $2; } END{ for (key in table) print key table[key] }' smaller_list.txt > smaller_list.txt
3. sort smaller_list.txt | uniq > final_list.txt
``` ```
## 5. Turn the final list into a form usable by Chromium tools ## 5. Turn the final list into a form usable by Chromium tools
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment