Second try at using LazilyDeallocatedDeque instead of circular_deque in TaskQueue
Previous patch: https://chromium-review.googlesource.com/c/chromium/src/+/1080792 Our usage pattern is unfortunate for existing queues such as base::circular_deque. We tend to fill up an empty queue and then drain all those tasks until it's empty. This means the queue yo-yos in size which confuses the memory reclamation schemes of most queues. As an optimisation we introduce a deque specialised for TaskQueueManager's usage patterns. For performance (memory allocation isn't free) we don't automatically reclaim memory when the queue becomes empty. Instead we rely on the surrounding code periodically calling MaybeShrinkQueue, ideally when the queue is empty. We keep track of the maximum recent queue size and rate limit how often MaybeShrinkQueue actually shrinks the buffer to avoid unnecessary churn. This yields a nice win on our microbenchmark: Patch: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 33448.166666666664 33215.75496688742 33484.34 34018.37414965987 33972.18243243243 33846.91891891892 34489.737931034484 34727.90277777778 33367.90666666667 33167.54304635762 33392.96 33906.89864864865 33392.13333333333 33107.17763157895 33340.18 33718.73825503356 37921.01515151515 39379.06299212598 38851.27906976744 39366.03125 38171.564885496184 37401.72388059701 37640.32330827068 37800.51127819549 34691.2275862069 34359.61643835616 34993.468531468534 35366.795774647886 35981.20863309353 35089.18881118881 38530.230769230766 39280.3515625 39262.8671875 36411.384057971016 33576.10067114094 33939.69594594595 37913.59848484849 38324.076335877864 38061.59848484849 39921.00793650794 Average 35812.1871 35430.24471 35636.02188 36204.63076 ToT: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 40459.540322580644 40536.04838709677 38994.573643410855 38696.2 39422.149606299216 39299.5 37888.18939393939 37874.74436090225 38419.70229007633 38025.742424242424 37844.41353383459 38020.469696969696 35052.72027972028 38147.80303030303 35504.89361702128 34138.02721088436 37096.77777777778 34942.541666666664 37003.529411764706 37579.60447761194 38818.67441860465 38233.068702290075 37978.628787878784 37867.57142857143 38455.49618320611 37903.05303030303 38106.143939393936 38129.5 40609.33064516129 37721.75187969925 34656.441379310345 34294.33561643836 35273.704225352114 34646.324137931035 34335.643835616436 34311.82876712329 35821.41428571429 35362.035211267605 37522.27611940299 35429.281690140844 Average 37942.951 37481.78685 36983.47337 36634.15632 Percentage improvement 5.61570422 5.473437399 3.643388159 1.172472933 NB the reason the improvement goes down with the number of queues is because we're saving malloc overhead in the queue, but a constant number of tasks are posted across N queues. This means the more queues we have in this test, the less loaded the queues are individually. Change-Id: I975d7f864dc55715fb9f949ef65321da93e4cef4 Reviewed-on: https://chromium-review.googlesource.com/1169043Reviewed-by:Sami Kyöstilä <skyostil@chromium.org> Commit-Queue: Alex Clarke <alexclarke@chromium.org> Cr-Commit-Position: refs/heads/master@{#582586}
Showing
Please register or sign in to comment