Table of contents for Intel Core i7 processor
- Introduction
- Command line options
- Event counter mask
- Event inverse mask
- Event edge mask
- Any thread
- Counting uncore events
- Occupancy reset filter
- Sampling uncore events
- Precise Event-Based Sampling (PEBS)
- References
To go back to top level documentation click here.
1. Introduction
Pfmon provides access to ALL the Intel Core i7 PMU specific features.
The Core i7 PMU consists of:
- Intel Nehalem core PMU
- Intel Core i7 uncore PMU
The following Core i7 features are supported:
- 7 counters (3 fixed counters, 4 generic counters)
- Event counter mask
- Event inverse mask
- Event edge mask
- Any thread filter
- 16-entry deep Last Branch Record (LBR)
- 9 uncore counters (8 generic counters, 1 fixed counter)
- Occupancy filter
- Precise Event-Based Sampling (PEBS)
The three fixed counters are limited to measuring only one event each, namely:
INSTRUCTIONS_RETIRED, UNHALTED_CORE_CYCLES, UNHALTED_REFERENCE_CYCLES.
2. Command line options
The Intel Core i7 processor specific options of pfmon are as follows:
--counter-mask=msk1,msk2,... | set event counter mask (0,1,2,3) |
--inv-mask=i1,i2,... | set event inverse counter mask (y/n,0/1) |
--edge-mask=e1,e2,... | set event edge detect (y/n,0/1) |
--anythr-mask=e1,e2,... | set any thread filter (y/n,0/1) |
--occ-mask=e1,e2,... | set uncore occupancy reset filter (y/n,0/1) |
--smpl-module=pebs | use the kernel PEBS custom sampling format |
3. Event counter mask
Each counter supports a threshold value below which the occurrences of an event are not counted.
If the threshold is set to n, then the counter is incremented by 1 only if there are more than
n occurrences of the event per cycle. So effectively, the counter counts qualifying cycles.
Pfmon supports this threshold mechanism with the --counter-mask option which is used as follows:
$ pfmon -euops_retired:any ls /dev/null
/dev/null
2288329 UOPS_RETIRED:ANY
This counts the total number of micro-operations dispatched for the command ls while executing at the user level.
Now we want to count the number of cycles where two or more micro-operations are retired:
$ pfmon -euops_retired:any --counter-mask=2 ls /dev/null
/dev/null
583666 UOPS_RETIRED:ANY
We can push it further:
% pfmon -euops_retired:any --counter-mask=4 ls /dev/null
/dev/null
0 UOPS_RETIRED:ANY
When this option is not specified the counter mask is set to zero, i.e., each occurrence increments
the counter. The threshold is -bit wide. When this option is not present, the threshold is set to zero.
It is possible to set the threshold for multiple events. Yet only generic counters do support thresholds.
Here is an example:
$ pfmon -euops_retired:any,mem_load_retired:L2_HIT --counter-mask=2,2 ls /dev/null
/dev/null
580877 UOPS_RETIRED:ANY
0 MEM_LOAD_RETIRED:L2_HIT
This commands counts the number of cycles in which 2 or more micro-operations were retired and the number
of cycles in which there were 2 or more L2 data cache hits.
This threshold can be can be inverted, i.e., from greater or equal (>=) to less than (<) using the
--inv-mask option.
4. Event inverse mask
It is possible to invert what is measured using the --inv option. This option only makes real sense
when combined with the threshold of --counter-mask. In that case it simply
inverse the filtering from >= to < . The option takes a boolean value 0/n (false, the default) or 1/y (true). Inversion
may be set for multiple events using a comma separated list, e.g., --inv=0,n,1.
The following commands, for instance, counts the number of cycles in which less than one
micro-ops was retired, i.e., no micro-ops were retired, for the ls command:
$ pfmon -euops_retired:any --counter-mask=1 --inv=1 ls /dev/null
/dev/null
1412913 UOPS_RETIRED:ANY
5. Event edge mask
Edge detection, instead of default level detection, can be
enabled, per-event, with the --edge-mask. The true/false value can be expressed
with either 0/1 or y/n or any combination thereof.
6. Any thread filter
The any thread filter allows measurement to span the two logical hyperthreads.
By default, events are only measured for the current hyperthread. In system-wide mode, that means
only on the current CPU. The --anythr option is used to enable monitoring events in both hyperthreads
at the same time. In sstem-wide mode, it means measuring occurrences in both CPUs.
In the following example, we run a busy loop program on CPU1. Then on CPU0,
we measure the ls command. We pin the command to CPU0. First, the run without the any thread
filter:
$ pfmon --pin-command -euops_retired:any --system-wide --cpu-list=0 --anythr=0 ls /dev/null
/dev/null
CPU0 2793729 UOPS_RETIRED:ANY
Next, the same run with the any thred filter enabled:
$ pfmon --pin-command -euops_retired:any --system-wide --cpu-list=0 --anythr=1 ls /dev/null
/dev/null
CPU0 5222722 UOPS_RETIRED:ANY
7. Counting uncore events
The uncore PMU is a socket-level PMU, therefore it is shared by all cores/threads
on that socket. By nature, accessing the uncore PMU has certain constraints:
- Only accessible to system-wide monitoring sessions. It is not possible to trace events back to
a core or hyperthread. Pfmon enforces the use of --system-wide
- There is no privilege level filtering, therefore pfmon imposes -uk option
- The underlying perfmon infrastructure imposes only one monitoring session per socket.
Pfmon checks for that condition. Use of --cpu-list is recommended to pick a CPU on a socket.
All uncore events starts with the UNC_ prefix. To list all uncore events
simply use the -l option:
$ pfmon -lunc_
UNC_CLK_UNHALTED
UNC_DRAM_PAGE_CLOSE
UNC_DRAM_PAGE_MISS
UNC_DRAM_PRE_ALL
UNC_DRAM_READ_CAS
UNC_DRAM_REFRESH
UNC_DRAM_WRITE_CAS
UNC_GQ_ALLOC
UNC_GQ_CYCLES_FULL
...
A typical uncore counting session the socket where CPU0 is located:
$ pfmon --system-wide -uk -eunc_clk_unhalted -t10 --cpu-list=0
8. Occupancy Reset Filter
9. Sampling uncore events
10. Precise Event-Based Sampling (PEBS)
6. References
Further documentation of performance monitoring for the Intel Core i7 processor
is available in the IA-32 architecture manual vol3b.
|