Major Update of Vector Class Library

From the docs:

    1.2 Features of VCL
    ∙ Vectors of 8-, 16-, 32- and 64-bit integers, signed and unsigned
    ∙ Vectors of single and double precision floating point numbers
    ∙ Total vector size 128, 256, or 512 bits
    ∙ Defines almost all common operators
    ∙ Boolean operations and branches on vector elements
    ∙ Many arithmetic functions
    ∙ Standard mathematical functions
    ∙ Permute, blend, gather, scatter, and table look-up functions
    ∙ Fast integer division
    ∙ Can build code for different instruction sets from the same source code
    ∙ CPU dispatching to utilize higher instruction sets when available
    ∙ Uses metaprogramming to find the optimal implementation for the selected instruction set and parameter values of a given operator or function
    ∙ Includes extra add-on packages for special purposes and applications

(Tldr: this is for CPU vectors and isn't directly comparable to a linear algebra library like Eigen.)

throwaway542134 · 7 years ago

> ∙ CPU dispatching to utilize higher instruction sets when available

What's the advantage of hand rolled CPU dispatching over compiler intrinsincs like function multiversioning in GCC/Clang?

pdovy · 7 years ago

FMV is pretty neat but there are scenarios where it's not ideal, so I'm not surprised to see it not used in what is meant to be a lightweight high performance library.

Notably the fact that the dispatching is done at runtime means you are trading off the convenience factor for code size and running extraneous dispatching code in your critical path. Additionally I've anecdotally seen on modern Intel hardware the power heuristics can penalize you for even _speculatively_ running some of the wider instruction sets.

Why would I choose this over http://eigen.tuxfamily.org/ or https://bitbucket.org/blaze-lib/blaze/src/master/ ?

I am just curious.

nn3 · 7 years ago

I think these are much higher level with larger vectors.

With vectors he means the low level vectors of the CPU.

So if you want to write your own low level algorithms, without going down to the actual intrinsics.

dkersten · 7 years ago

This is a SIMD library (vector in the sense of doing many things at once), what you linked are maths library (vector as in mathematical concept). There’s clearly a bunch of overlap, both are used to do calculations and afaik the libraries you link also use SIMD when they can, so the practical difference is that this library is lower level and more generally applicable (but presumably more complex or difficult to work with).

throwaway542134 · 7 years ago

If you don't actually need higher level math functions or it's prohibitive to implement something at the higher level. For example if you have an operation where you know the sparse-ness(?) of a matrix at compile time but the values of the matrix can change, you may want to implement the matrix multiplications by hand using SIMD ops.

jey · 7 years ago

snowAbstraction · 7 years ago

gameswithgo · 7 years ago

this is a great simd resource, even if you don’t use it directly the source can be a great guide on how to implement various tricky things you often need when doing simd intrinsics programming.

Deleted Comment

vortico · 7 years ago

Interesting. I've pieced together something like this for https://github.com/VCVRack/Rack/tree/v1/include/simd in C++11, but it only works for SSE2.

CogitoCogito · 7 years ago

He mentions using new metaprogramming techniques enabled by C++ 14/17 to choose optimal tuning parameters during compile time. What is the main upshot of this? Are their benchmarks showing they improve performance? Due they improve maintainability? Possibly both?

jepler · 7 years ago

I haven't digested just how it works, but "if constexpr" is used to choose different, more efficient instructions, when "special" permutations are chosen. Here is the implementation for a 128-bit permutation: https://github.com/vectorclass/version2/blob/master/vectorf1...

Implementation and comments related to "perm_flags" function here: https://github.com/vectorclass/version2/blob/6b16b1aaa388067...

rurban · 7 years ago

constexpr or not are making a huge difference in large scale vectorizable operations for me. In my memcpy it's either two times slower without or two times faster with.

Maintainablity is a bit harder, and compilers suck. You have always check new compiler regressions, esp. with the broken restrict/noalias feature with newer gcc's or clang with -O3.