Table of contents for Intel Core-based processor
- Introduction
- Command line options
- Event counter mask
- Event inverse mask
- Event edge mask
- Precise Event-Based Sampling (PEBS)
- References
To go back to top level documentation click here.
1. Introduction
Pfmon provides access to ALL the Intel Core-based PMU specific features,
implemented by Core 2 processors. Note that Intel Core Duo/Solo use another PMU model.
The following features are supported:
- 5 counters (3 fixed counters, 2 generic counters)
- Event counter mask
- Event inverse mask
- Event edge mask
- Precise Event-Based Sampling (PEBS)
The three fixed counters are limited to measuring only one event each, namely:
INSTRUCTIONS_RETIRED, UNHALTED_CORE_CYCLES, UNHALTED_REFERENCE_CYCLES.
2. Command line options
The Intel Core-based processor specific options of pfmon are as follows:
--counter-mask=msk1,msk2,... | set event counter mask (0,1,2,3) |
--inv-mask=i1,i2,... | set event inverse counter mask (y/n,0/1) |
--edge-mask=e1,e2,... | set event edge detect (y/n,0/1) |
--smpl-module=pebs | use the kernel PEBS custom sampling format |
3. Event counter mask
Each counter supports a threshold value below which the occurrences of an event are not counted.
If the threshold is set to n, then the counter is incremented by 1 only if there are more than
n occurrences of the event per cycle. So effectively, the counter counts qualifying cycles.
Pfmon supports this threshold mechanism with the --counter-mask option which is used as follows:
$ pfmon -ers_uops_dispatched ls /dev/null
/dev/null
1080827 RS_UOPS_DISPATCHED
This counts the total number of micro-operations dispatched for the command ls while executing at the user level.
Now we want to count the number of cycles where two or more micro-operations are retired:
$ pfmon -ers_uops_dispatched --counter-mask=2 ls /dev/null
/dev/null
320478 RS_UOPS_DISPATCHED
We can push it further:
% pfmon -ers_uops_dispatched --counter-mask=6 ls /dev/null
/dev/null
790 RS_UOPS_DISPATCHED
When this option is not specified the counter mask is set to zero, i.e., each occurrence increments
the counter. The threshold is -bit wide. When this option is not present, the threshold is set to zero.
It is possible to set the threshold for multiple events. Yet only generic counters do support thresholds.
Here is an example:
$ pfmon -ers_uops_dispatched,mem_load_retired:L1D_LINE_MISS --counter-mask=6,2 ls /dev/null
/dev/null
787 RS_UOPS_DISPATCHED
61 MEM_LOAD_RETIRED:L1D_LINE_MISS
This commands counts the number of cycles in which 6 or more micro-operations were dispatched and the number
of cycles in which there were 2 or more L2 data cache misses.
This threshold can be can be inverted, i.e., from greater or equal (>=) to less than (<) using the
--inv-mask option.
4. Event inverse mask
It is possible to invert what is measured using the --inv option. This option only makes real sense
when combined with the threshold of --counter-mask. In that case it simply
inverse the filtering from >= to < . The option takes a boolean value 0/n (false, the default) or 1/y (true). Inversion
may be set for multiple events using a comma separated list, e.g., --inv=0,n,1.
The following commands, for instance, counts the number of stall cycles for the ls command:
$ pfmon -uk -ers_uops_dispatched --inv=1 --counter-mask=1 ls /dev/null
/dev/null
2674704 RS_UOPS_DISPATCHED
5. Event edge mask
Edge detection, instead of default level detection, can be
enabled, per-event, with the --edge-mask. The true/false value can be expressed
with either 0/1 or y/n or any combination thereof.
5. Precise Event-Based Sampling (PEBS)
To make use of this feature, you must have kernel perfmon2 v2.7 (2.6.23) or higher.
PEBS is an advanced sampling feature of the Intel Core-based processors in which the
processor is directly recording samples into a designated memory region. The perfmon2 interface supports
this type of sampling via a custom sampling format. For the Intel Core-based processor, the
perfmon_pebs_core_smpl module must be loaded into the kernel, otherwise pfmon will fail. You can verify this
by checking out /sys/kernel/perfmon/formats.
With PEBS, the format of the samples is mandated by the processor. Each samples contains the
machine state of the processor at the time the counter overflowed. The precision of PEBS comes from the fact that
the instruction pointer recorded in each sample is at most one instruction away from where the counter actually
overflowed. The skid is mimized compared to regular interrupted instruction pointer. Another key advantage of PEBS
is that is minimizes the overhead because the Linux kernel is only involved when the PEBS buffer fills up, i.e.,
there is no interrupt until a lot of samples are available.
A constraint of PEBS is that it works only with certain events, and there is no flexibility in
what is recorded in each sample. For instance, in system-wide the process identification is not recorded. Pfmon
will reject any event that cannot work with PEBS. It is possible to know which events support PEBS by using the
pfmon -i command. Let us say, you want to sample on INSTRUCTIONS_RETIRED:
% pfmon -iinstructions_retired
Name : INSTRUCTIONS_RETIRED
Code : 0xc0
Counters : [ 0 1 16 ]
Desc : count the number of instructions at retirement...
PEBS : Yes
Notice the last field (PEBS). It is set to yes. Only one counter supports PEBS, therefore only
one event can be sampled. Event options such as inversion or threshold are supported. For instance, to sample
on the number of cycles in which no instructions is retired (stalls):
% pfmon --smpl-module=pebs -einstructions_retired --inv=1 --counter-mask=1 --long-smpl-periods=2660000 -uk -- foo
# counts %self %cum code addr
1976 70.67% 70.67% 0x00000000004005e9
710 25.39% 96.07% 0x00000000004005f1
61 2.18% 98.25% 0x00000000004005e0
21 0.75% 99.00% 0x00000000004005ed
10 0.36% 99.36% 0x00000000004005f4
2 0.07% 99.43% 0xffffffff80583110
2 0.07% 99.50% 0xffffffff8024b4f8
...
By construction PEBS does reset the counter after each sample. Thus software randomization
of the sample period does not work, except to reset the period after each PEBS buffer overflow. Consequently,
the --smpl-periods-random will not have quite the same effect as with regalar sampling
6. References
Further documentation of performance monitoring for the Intel Core-based processor
is available in this manual vol3b.
|