Improving code performance with EXPLICIT_VECTORIZATION

Dear gadget-list members,

I would like to ask about your experiences with EXPLICIT_VECTORIZATION.
I am running cosmological hydrodynamics simulations on Intel
Xeon(E5-2699A v4) nodes. I was wondering if I could speed up my
simulations with EXPLICIT_VECTORIZATION so I set up two identical runs
-- one with and one without EXPLICIT_VECTORIZATION -- in order to see
which one is faster.
The results indicated that both performed equally well, i.e. they took
approximately equally long to perform the computation.
Now I am wondering if my compiler already implicitly vectorizes the
loops (I am using icpc (ICC) 17.0.3 and OpenMPI 2.0.0 with the compiler
flags -O3 -march=core-avx2 -Wall -Wno-format-security).
It would be interesting to see on which machines one can actually expect
an improvement with this option, so please share your experiences in
this thread!

