C and vectorisation

C and Fortran's for have been compared, but neither is ideal for the modern world of vectorisation and SIMD instructions and functional units. For a compiler to be able to vectorise a loop easily, it must know the total number of loop iterations at run-time, and it must know that the iterations are independent. There are two good solutions to assisting the compiler.

Fortran 2008

  do concurrent (i=1:n)

Whilst this example is rather trivial, the syntax specifies that the individual iterations may be processed in any order, and with any degree of overlap. The compiler is then free to use threads, SIMD instructions, or simply unrolling followed by mixing up the instructions of different iterations as it feels fit. In this case, a decent compiler would not need the hint.

OpenMP 4

#pragma omp simd

!$omp simd
  do i=1,n

The above OpenMP simd directive makes these loops almost equivalent to the Fortran do concurrent example. One difference is that the body of a do concurrent loop is permitted to call any function declared to be pure, whereas the simd loop can call any function declared as a simd function. There is also the possibility of doing reductions:

#pragma omp simd reduction(+:sum)

Fortran has no non-OpenMP equivalent, save that it has intrinsics for the very basic operations, such as summing a vector, and these should be appropriately vectorised.

OpenMP provides a standardised set of directives which should have the same meaning across multiple compilers, and which have C and Fortran versions. They are much more portable than the various "ivdep" directives which may mean subtly different things to different compilers.

Before version 4 OpenMP dealt with threading only. Version 4 was released in 2013.