Menu

Vectorization using AVX-512, AVX-2 and AVX

With the latest AMD processors adding support for AVX-512, the benefits of vectorization become bigger than ever. The processor can now issue 8 double precision floating operations in parallel, per thread. This includes multiply/add, and even square root!
The program uses the processor feature detection function automatically run by the Microsoft Visual Studio supplied startup code. To see the AVX level supported, run the command line help:
sv help
Look for this line:
groupsize= limit double precision vectorization level [8]
The 8 shows that 8 double precision operations will be issued in parallel. Here are the other possibilities:
8: AVX-512
4: AVX-2
2: AVX
The command line option groupsize= lets you force a different vectorization level than detected.
Here is example code generation. The function updateIIR2stage8wAVX512 updates 8 two stage biquad filters in parallel. Thanks to the extra 16 registers available with AVX-512, the compiler is able to load the coefficients once and not have to reload them for each stage updated:

updateIIR2stage8wAVX512:
  0000000140005E50:   vmovupd     zmm2,zmmword ptr [rcx+2C0h]
  0000000140005E57:   vmovupd     zmm18,zmmword ptr [rcx+80h]
  0000000140005E5E:   vmovupd     zmm19,zmmword ptr [rcx+0C0h]
  0000000140005E65:   vmovupd     zmm20,zmmword ptr [rcx+100h]
  0000000140005E6C:   vmovupd     zmm1,zmmword ptr [rcx+240h]
  0000000140005E73:   vmovupd     zmm3,zmmword ptr [rdx]
  0000000140005E79:   vmulpd      zmm17,zmm3,zmmword ptr [rcx]
  0000000140005E7F:   vfmadd231pd zmm17,zmm2,zmmword ptr [rcx+40h]
  0000000140005E86:   vfmadd231pd zmm17,zmm18,zmmword ptr [rcx+300h]
  0000000140005E8D:   vfmadd231pd zmm17,zmm19,zmm1
  0000000140005E93:   vfmadd231pd zmm17,zmm20,zmmword ptr [rcx+280h]
  0000000140005E9A:   vmovupd     zmmword ptr [rcx+280h],zmm1
  0000000140005EA1:   vmovupd     zmmword ptr [rcx+240h],zmm17
  0000000140005EA8:   vmovupd     zmmword ptr [rcx+300h],zmm2
  0000000140005EAF:   vmovupd     zmmword ptr [rcx+2C0h],zmm3
  0000000140005EB6:   vmovupd     zmm4,zmmword ptr [rcx+1C0h]
  0000000140005EBD:   vmulpd      zmm3,zmm17,zmmword ptr [rcx]
  0000000140005EC3:   vfmadd231pd zmm3,zmm4,zmmword ptr [rcx+40h]
  0000000140005ECA:   vfmadd231pd zmm3,zmm18,zmmword ptr [rcx+200h]
  0000000140005ED1:   vmovupd     zmm2,zmmword ptr [rcx+140h]
  0000000140005ED8:   vfmadd231pd zmm3,zmm2,zmm19
  0000000140005EDE:   vfmadd231pd zmm3,zmm20,zmmword ptr [rcx+180h]
  0000000140005EE5:   vmovupd     zmmword ptr [rcx+180h],zmm2
  0000000140005EEC:   vmovupd     zmmword ptr [rcx+140h],zmm3
  0000000140005EF3:   vmovupd     zmmword ptr [rcx+200h],zmm4
  0000000140005EFA:   vmovupd     zmmword ptr [rcx+1C0h],zmm17
  0000000140005F01:   vmovupd     zmm0,zmm3
  0000000140005F07:   ret
Posted by sduplichan 2025-01-06

Log in to post a comment.

MongoDB Logo MongoDB