OpenBLAS and CPU detection
OpenBLAS is an excellent, open-source, BLAS and Lapack library. It offers very good performance on a wide range of processors.
But in one respect it has a serious issue with its processor detection.
To get the best performance on an Intel/AMD CPU, it needs to know not only the supported instruction set (vector length of two, four or eight? does FMA exist? etc.), but also cache sizes and associativity, etc. It determines these from a compiled-in lookup table based on CPU model number.
But the defaults for CPUs missing from that table, which generally
mean CPUs launched after OpenBLAS was last updated, the defaults are
shows a strong inclination to use generic non-AVX code on an
unidentified AMD CPU.
The difference running on a Zen3-based AMD CPU and the version of OpenBLAS which ships with Ubuntu 20.04 is clear:
$ OMP_NUM_THREADS=1 ./linpack10k 20,500 MFLOPS $ OMP_NUM_THREADS=1 OPENBLAS_CORETYPE=Haswell ./linpack10k 53,200 MFLOPS
One might also try setting
"Zen", which gives very similar performance to "Haswell"..