Only use custom SSE FMUL and FMAC with non-clang compilers.

clang's auto-vectorized C version performs better according to the Chrome Performance Dashboard. Searching back through the logs, this occurred when we switched over to clang by default. We could try to microoptimize further, but it's less of a maintenance burden to just let the compiler do its thing! The main reason the clang version is faster is it does 2x 128bit operations per loop. Simply copying these optimization yields ~97% similar performance, but the SIMD code a bit gnarlier. As such I choose to simply use the C variant when clang is present. BUG=none TEST=none Review URL: https://codereview.chromium.org/599693002 Cr-Commit-Position: refs/heads/master@{#297268}

Only use custom SSE FMUL and FMAC with non-clang compilers.
clang's auto-vectorized C version performs better according to the Chrome Performance Dashboard. Searching back through the logs, this occurred when we switched over to clang by default. We could try to microoptimize further, but it's less of a maintenance burden to just let the compiler do its thing! The main reason the clang version is faster is it does 2x 128bit operations per loop. Simply copying these optimization yields ~97% similar performance, but the SIMD code a bit gnarlier. As such I choose to simply use the C variant when clang is present. BUG=none TEST=none Review URL: https://codereview.chromium.org/599693002 Cr-Commit-Position: refs/heads/master@{#297268}
65707310 · dalecurtis · Commit bot · b91e7865 · 65707310
Commit 65707310 authored Sep 29, 2014 by dalecurtis Committed by Commit bot Sep 29, 2014
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

media/base/vector_math.cc media/base/vector_math.cc +7 -0

No files found.
--- a/media/base/vector_math.cc
+++ b/media/base/vector_math.cc
@@ -13,8 +13,15 @@
 // NaCl does not allow intrinsics.
 #if defined(ARCH_CPU_X86_FAMILY) && !defined(OS_NACL)
 #include <xmmintrin.h>
+// Don't use custom SSE versions where the auto-vectorized C version performs
+// better, which is anywhere clang is used.
+#if !defined(__clang__)
 #define FMAC_FUNC FMAC_SSE
 #define FMUL_FUNC FMUL_SSE
+#else
+#define FMAC_FUNC FMAC_C
+#define FMUL_FUNC FMUL_C
+#endif
 #define EWMAAndMaxPower_FUNC EWMAAndMaxPower_SSE
 #elif defined(ARCH_CPU_ARM_FAMILY) && defined(USE_NEON)
 #include <arm_neon.h>