Mask Profiler

Intel® AVX-512 Architecture added mask registers. These registers control which elements in the vector register are written. Similarly they can be used to avoid memory faults when accessing beyond the page boundary.

Intel® SDE provides two mask profiling features. The simple mask profiler and the dynamic mask profiler which has a richer analysis capabilities.

Simple Mask Profiler

The simple mask profiler calculates mask usages across the entire thread. It provides the following mask usage information.

  • Total number of executed instructions

  • Total number of masked instructions

  • How many times all the mask bits were set (all ones)

  • How many times all the masked bits were off (all zeros)

  • Instructions with exactly one bit set

  • Instructions were only the low bit is set (like a scalar instruction)

  • Usage of each mask register

  • Breakdown per number of elements

The output looks like:

TID: 0
----------------------------------------------------------------
icount: 170734

uses: 1920
all_ones: 555
all_zeros: 1365
one_hot: 0
scalarish: 0
mask_regs_used: 2
regid_r[0]: 555
regid_r[1]: 1365
regid_w[1]: 1365
popcnt[0]: 1365
popcnt[16]: 555

VECTOR_ELEMENTS : 16
uses: 1110
all_ones: 555
all_zeros: 555
one_hot: 0
scalarish: 0
popcnt[16]: 555

This tool provides the following knobs:

-mask_profile

Enable mask profiling [default 0]

-omask_profile

Specify profile file output name [default sde-mask-profile.txt]

Dynamic Mask Profiler

The dynamic mask profiler provides detailed mask usage information per instruction. This information includes:

  • The instruction address and disassembly

  • Image and function names

  • Execution count, how many times the instruction was executed

  • Computation count, how many vector elements were executed based on the mask bits

  • Utilization count, percentage of computation count divided by maximal count

  • Specific analysis for sparse (gather and scatter) instructions, which includes an option to analyze converting gather to shuffle.

Example for the dynamic mask profiler output:

<instruction-details>
   <IP> 0x401546 </IP>
   <disassembly> vgatherdps zmm0, k1, dword ptr [rax+zmm5*1] </disassembly>
   <source-location>
      <img> myapp </img>
      <routine> foo </routine>
   </source-location>
   <dynamic-stats>
      <execution-counts> 60 </execution-counts>
      <computation-count> 120 </computation-count>
      <percent-of-max-computation>  12.500 </percent-of-max-computation>
      <popcount>
         <popcount2> 60 </popcount2>
      </popcount>
      <sparse-stats>
         <touched-pages>
            <pages> 1 </pages>
            <count> 60 </count>
         </touched-pages>
         <touched-cachelines>
            <cachelines> 2 </cachelines>
            <count> 60 </count>
         </touched-cachelines>
      </sparse-stats>
   </dynamic-stats>
</instruction-details>

The knobs for the dynamic mask profiler:

-dyn_mask_detect_g2s

Enable g2s (gather to shuffle) detection [default 0]

-dyn_mask_page_bytes

Define the page size for sparse stats [default 4096]

-dyn_mask_profile

Enable dynamic mask profiling [default 0]

-odyn_mask_detect_g2s

Specify the g2s detection file output name [default sde-detect-g2s.txt]

-odyn_mask_profile

Specify profile file output name [default sde-dyn-mask-profile.txt]