Table of contents for Intel Atom processor
- Introduction
- Command line options
- Event counter mask
- Event inverse mask
- Event edge mask
- Any thread
- Precise Event-Based Sampling (PEBS)
- References
To go back to top level documentation click here.
1. Introduction
Pfmon provides access to ALL the Intel Atom PMU specific features.
The following features are supported:
- 5 counters (3 fixed counters, 2 generic counters)
- Event counter mask
- Event inverse mask
- Event edge mask
- Any thread filter
- Precise Event-Based Sampling (PEBS)
The three fixed counters are limited to measuring only one event each, namely:
INSTRUCTIONS_RETIRED, UNHALTED_CORE_CYCLES, UNHALTED_REFERENCE_CYCLES.
2. Command line options
The Intel Atom processor specific options of pfmon are as follows:
--counter-mask=msk1,msk2,... | set event counter mask (0,1,2,3) |
--inv-mask=i1,i2,... | set event inverse counter mask (y/n,0/1) |
--edge-mask=e1,e2,... | set event edge detect (y/n,0/1) |
--anythr-mask=e1,e2,... | set any thread filter (y/n,0/1) |
--smpl-module=pebs | use the kernel PEBS custom sampling format |
3. Event counter mask
Each counter supports a threshold value below which the occurrences of an event are not counted.
If the threshold is set to n, then the counter is incremented by 1 only if there are more than
n occurrences of the event per cycle. So effectively, the counter counts qualifying cycles.
Pfmon supports this threshold mechanism with the --counter-mask option which is used as follows:
$ pfmon -euops_retired:any ls /dev/null
/dev/null
2288329 UOPS_RETIRED:ANY
This counts the total number of micro-operations dispatched for the command ls while executing at the user level.
Now we want to count the number of cycles where two or more micro-operations are retired:
$ pfmon -euops_retired:any --counter-mask=2 ls /dev/null
/dev/null
583666 UOPS_RETIRED:ANY
We can push it further:
% pfmon -euops_retired:any --counter-mask=4 ls /dev/null
/dev/null
0 UOPS_RETIRED:ANY
When this option is not specified the counter mask is set to zero, i.e., each occurrence increments
the counter. The threshold is -bit wide. When this option is not present, the threshold is set to zero.
It is possible to set the threshold for multiple events. Yet only generic counters do support thresholds.
Here is an example:
$ pfmon -euops_retired:any,mem_load_retired:L2_MISS --counter-mask=2,2 ls /dev/null
/dev/null
580877 UOPS_RETIRED:ANY
0 MEM_LOAD_RETIRED:L2_MISS
This commands counts the number of cycles in which 2 or more micro-operations were retired and the number
of cycles in which there were 2 or more L2 data cache misses.
This threshold can be can be inverted, i.e., from greater or equal (>=) to less than (<) using the
--inv-mask option.
4. Event inverse mask
It is possible to invert what is measured using the --inv option. This option only makes real sense
when combined with the threshold of --counter-mask. In that case it simply
inverse the filtering from >= to < . The option takes a boolean value 0/n (false, the default) or 1/y (true). Inversion
may be set for multiple events using a comma separated list, e.g., --inv=0,n,1.
The following commands, for instance, counts the number of cycles in which less than one
micro-ops was retired, i.e., no micro-ops were retired, for the ls command:
$ pfmon -euops_retired:any --counter-mask=1 --inv=1 ls /dev/null
/dev/null
1412913 UOPS_RETIRED:ANY
5. Event edge mask
Edge detection, instead of default level detection, can be
enabled, per-event, with the --edge-mask. The true/false value can be expressed
with either 0/1 or y/n or any combination thereof.
6. Any thread filter
The any thread filter allows measurement to span the two logical hyperthreads.
By default, events are only measured for the current hyperthread. In system-wide mode, that means
only on the current CPU. The --anythr option is used to enable monitoring events in both hyperthreads
at the same time. In sstem-wide mode, it means measuring occurrences in both CPUs.
In the following example, we run a busy loop program on CPU1. Then on CPU0,
we measure the ls command. We pin the command to CPU0. First, the run without the any thread
filter:
$ pfmon --pin-command -euops_retired:any --system-wide --cpu-list=0 --anythr=0 ls /dev/null
/dev/null
CPU0 2793729 UOPS_RETIRED:ANY
Next, the same run with the any thred filter enabled:
$ pfmon --pin-command -euops_retired:any --system-wide --cpu-list=0 --anythr=1 ls /dev/null
/dev/null
CPU0 5222722 UOPS_RETIRED:ANY
5. Precise Event-Based Sampling (PEBS)
To make use of this feature, you must have kernel perfmon2 v2.81 or higher.
PEBS is an advanced sampling feature of the Intel Core-based processors in which the
processor is directly recording samples into a designated memory region. The Intel Atom PEBS support is
identical to the Core PEBS.
The perfmon2 interface supports this type of sampling via a custom sampling format.
For the Intel Atom processor, the perfmon_pebs_core_smpl module must be loaded into the kernel,
otherwise pfmon will fail. You can verify this by checking out /sys/kernel/perfmon/formats.
With PEBS, the format of the samples is mandated by the processor. Each samples contains the
machine state of the processor at the time the counter overflowed. The precision of PEBS comes from the fact that
the instruction pointer recorded in each sample is at most one instruction away from where the counter actually
overflowed. The skid is mimized compared to regular interrupted instruction pointer. Another key advantage of PEBS
is that is minimizes the overhead because the Linux kernel is only involved when the PEBS buffer fills up, i.e.,
there is no interrupt until a lot of samples are available.
A constraint of PEBS is that it works only with certain events, and there is no flexibility in
what is recorded in each sample. For instance, in system-wide the process identification is not recorded. Pfmon
will reject any event that cannot work with PEBS. It is possible to know which events support PEBS by using the
pfmon -i command. Let us say, you want to sample on INSTRUCTIONS_RETIRED:
$ pfmon -iinstructions_retired
Name : INSTRUCTIONS_RETIRED
Code : 0xc0
Counters : [ 0 1 16 ]
Desc : Instructions retired
PEBS : Yes
Notice the last field (PEBS). It is set to yes. Only one counter supports PEBS, therefore only
one event can be sampled. Event options such as inversion or threshold are supported. For instance, to sample
on the number of cycles in which no instructions is retired (stalls):
% pfmon --smpl-module=pebs -einstructions_retired --inv=1 --counter-mask=1 --long-smpl-periods=2660000 -uk -- foo
# counts %self %cum code addr
1976 70.67% 70.67% 0x00000000004005e9
710 25.39% 96.07% 0x00000000004005f1
61 2.18% 98.25% 0x00000000004005e0
21 0.75% 99.00% 0x00000000004005ed
10 0.36% 99.36% 0x00000000004005f4
2 0.07% 99.43% 0xffffffff80583110
2 0.07% 99.50% 0xffffffff8024b4f8
...
By construction PEBS does reset the counter after each sample. Thus software randomization
of the sample period does not work, except to reset the period after each PEBS buffer overflow. Consequently,
the --smpl-periods-random will not have quite the same effect as with regalar sampling
6. References
Further documentation of performance monitoring for the Intel Atom processor
is available in the IA-32 architecture manual vol3b.
|