C and vectorisation
C and Fortran's for
have been
compared, but neither is ideal for the modern world of
vectorisation and SIMD instructions and functional units. For a
compiler to be able to vectorise a loop easily, it must know the
total number of loop iterations at run-time, and it must know
that the iterations are independent. There are two good solutions
to assisting the compiler.
Fortran 2008
do concurrent (i=1:n) a(i)=b(i)*c(i) enddo
Whilst this example is rather trivial, the syntax specifies that the individual iterations may be processed in any order, and with any degree of overlap. The compiler is then free to use threads, SIMD instructions, or simply unrolling followed by mixing up the instructions of different iterations as it feels fit. In this case, a decent compiler would not need the hint.
OpenMP 4
#pragma omp simd for(i=0;i<n;i++) a[i]=b[i]*c[i]; !$omp simd do i=1,n a(i)=b(i)*c(i) enddo
The above OpenMP simd directive makes these loops almost equivalent
to the Fortran do concurrent
example. One
difference is that the body of a do concurrent
loop is
permitted to call any function declared to be pure, whereas the simd
loop can call any function declared as a simd function. There is
also the possibility of doing reductions:
sum=0; #pragma omp simd reduction(+:sum) for(i=0;i<n;i++) sum+=a[i];
Fortran has no non-OpenMP equivalent, save that it has intrinsics for the very basic operations, such as summing a vector, and these should be appropriately vectorised.
OpenMP provides a standardised set of directives which should have the same meaning across multiple compilers, and which have C and Fortran versions. They are much more portable than the various "ivdep" directives which may mean subtly different things to different compilers.
Before version 4 OpenMP dealt with threading only. Version 4 was released in 2013.