Commit 65707310 authored by dalecurtis's avatar dalecurtis Committed by Commit bot

Only use custom SSE FMUL and FMAC with non-clang compilers.

clang's auto-vectorized C version performs better according to the
Chrome Performance Dashboard.  Searching back through the logs, this
occurred when we switched over to clang by default.

We could try to microoptimize further, but it's less of a maintenance
burden to just let the compiler do its thing!

The main reason the clang version is faster is it does 2x 128bit
operations per loop. Simply copying these optimization yields ~97%
similar performance, but  the SIMD code a bit gnarlier. As such I
choose to simply use the C variant when clang is present.

BUG=none
TEST=none

Review URL: https://codereview.chromium.org/599693002

Cr-Commit-Position: refs/heads/master@{#297268}
parent b91e7865
...@@ -13,8 +13,15 @@ ...@@ -13,8 +13,15 @@
// NaCl does not allow intrinsics. // NaCl does not allow intrinsics.
#if defined(ARCH_CPU_X86_FAMILY) && !defined(OS_NACL) #if defined(ARCH_CPU_X86_FAMILY) && !defined(OS_NACL)
#include <xmmintrin.h> #include <xmmintrin.h>
// Don't use custom SSE versions where the auto-vectorized C version performs
// better, which is anywhere clang is used.
#if !defined(__clang__)
#define FMAC_FUNC FMAC_SSE #define FMAC_FUNC FMAC_SSE
#define FMUL_FUNC FMUL_SSE #define FMUL_FUNC FMUL_SSE
#else
#define FMAC_FUNC FMAC_C
#define FMUL_FUNC FMUL_C
#endif
#define EWMAAndMaxPower_FUNC EWMAAndMaxPower_SSE #define EWMAAndMaxPower_FUNC EWMAAndMaxPower_SSE
#elif defined(ARCH_CPU_ARM_FAMILY) && defined(USE_NEON) #elif defined(ARCH_CPU_ARM_FAMILY) && defined(USE_NEON)
#include <arm_neon.h> #include <arm_neon.h>
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment