Intel Training Exercise on Vectorization, Processes Affinity on NUMA Systems and Parallel I/O Performance Tuning
Instructed by Philippe Thierry, Intel
Wednesday 16 September
This training exercise will be offered as a part of the workshop programme
Seeking performance on supercomputers has many different aspects.
From programming models to programming languages, math libraries to hardware counter libraries, application tuning to scalability, there are many important points to consider.
For the current training we focus on three topics of ascending importance:
* SIMD vector operations (so called vectorization or SIMDization),
o compiler report and tools as Vector advisor
* process affinity and pinning to avoid dramatic NUMA penalties on 2- or 4-socket systems
o first touch policy and its impact on thread programming
o how to use the environments variables for pinning and for hybrid MPI+OMP
* parallel I/O and parallel file systems
o tuning lustre clients and servers for specific IO patterns
o troubleshooting application patterns using standard techniques (strace, system tap, etc)
o lustre-specific metrics for parallelization and optimization application I/O to achieve target performance