Mask Profiler
Intel® AVX-512 Architecture added mask registers. These registers control which elements in the vector register are written. Similarly they can be used to avoid memory faults when accessing beyond the page boundary.
Intel® SDE provides two mask profiling features. The simple mask profiler and the dynamic mask profiler which has a richer analysis capabilities.
Simple Mask Profiler
The simple mask profiler calculates mask usages across the entire thread. It provides the following mask usage information.
Total number of executed instructions
Total number of masked instructions
How many times all the mask bits were set (all ones)
How many times all the masked bits were off (all zeros)
Instructions with exactly one bit set
Instructions were only the low bit is set (like a scalar instruction)
Usage of each mask register
Breakdown per number of elements
The output looks like:
TID: 0
----------------------------------------------------------------
icount: 170734
uses: 1920
all_ones: 555
all_zeros: 1365
one_hot: 0
scalarish: 0
mask_regs_used: 2
regid_r[0]: 555
regid_r[1]: 1365
regid_w[1]: 1365
popcnt[0]: 1365
popcnt[16]: 555
VECTOR_ELEMENTS : 16
uses: 1110
all_ones: 555
all_zeros: 555
one_hot: 0
scalarish: 0
popcnt[16]: 555
This tool provides the following knobs:
- -mask_profile
Enable mask profiling [default 0]
- -omask_profile
Specify profile file output name [default sde-mask-profile.txt]
Dynamic Mask Profiler
The dynamic mask profiler provides detailed mask usage information per instruction. This information includes:
The instruction address and disassembly
Image and function names
Execution count, how many times the instruction was executed
Computation count, how many vector elements were executed based on the mask bits
Utilization count, percentage of computation count divided by maximal count
Specific analysis for sparse (gather and scatter) instructions, which includes an option to analyze converting gather to shuffle.
Example for the dynamic mask profiler output:
<instruction-details>
<IP> 0x401546 </IP>
<disassembly> vgatherdps zmm0, k1, dword ptr [rax+zmm5*1] </disassembly>
<source-location>
<img> myapp </img>
<routine> foo </routine>
</source-location>
<dynamic-stats>
<execution-counts> 60 </execution-counts>
<computation-count> 120 </computation-count>
<percent-of-max-computation> 12.500 </percent-of-max-computation>
<popcount>
<popcount2> 60 </popcount2>
</popcount>
<sparse-stats>
<touched-pages>
<pages> 1 </pages>
<count> 60 </count>
</touched-pages>
<touched-cachelines>
<cachelines> 2 </cachelines>
<count> 60 </count>
</touched-cachelines>
</sparse-stats>
</dynamic-stats>
</instruction-details>
The knobs for the dynamic mask profiler:
- -dyn_mask_detect_g2s
Enable g2s (gather to shuffle) detection [default 0]
- -dyn_mask_page_bytes
Define the page size for sparse stats [default 4096]
- -dyn_mask_profile
Enable dynamic mask profiling [default 0]
- -odyn_mask_detect_g2s
Specify the g2s detection file output name [default sde-detect-g2s.txt]
- -odyn_mask_profile
Specify profile file output name [default sde-dyn-mask-profile.txt]