OpenBLAS and CPU detection

OpenBLAS is an excellent, open-source, BLAS and Lapack library. It offers very good performance on a wide range of processors.

But in one respect it has a serious issue with its processor detection.

To get the best performance on an Intel/AMD CPU, it needs to know not only the supported instruction set (vector length of two, four or eight? does FMA exist? etc.), but also cache sizes and associativity, etc. It determines these from a compiled-in lookup table based on CPU model number.

But the defaults for CPUs missing from that table, which generally mean CPUs launched after OpenBLAS was last updated, the defaults are strange. The file driver/others/dynamic.c shows a strong inclination to use generic non-AVX code on an unidentified AMD CPU.

The difference running on a Zen3-based AMD CPU and the version of OpenBLAS which ships with Ubuntu 20.04 is clear:

$ OMP_NUM_THREADS=1 ./linpack10k
  20,500 MFLOPS
$ OMP_NUM_THREADS=1 OPENBLAS_CORETYPE=Haswell ./linpack10k
  53,200 MFLOPS

One might also try setting OPENBLAS_CORETYPE= to "Zen", which gives very similar performance to "Haswell"..