perfmon2
   the hardware-based performance monitoring interface for Linux
opensource.hp.com Link to Linux and HP web site  
Pfmon Intel Atom processor documentation
Table of contents for Intel Atom processor
  1. Introduction
  2. Command line options
  3. Event counter mask
  4. Event inverse mask
  5. Event edge mask
  6. Any thread
  7. Precise Event-Based Sampling (PEBS)
  8. References
To go back to top level documentation click here.
1. Introduction

Pfmon provides access to ALL the Intel Atom PMU specific features.

The following features are supported:

  • 5 counters (3 fixed counters, 2 generic counters)
  • Event counter mask
  • Event inverse mask
  • Event edge mask
  • Any thread filter
  • Precise Event-Based Sampling (PEBS)

The three fixed counters are limited to measuring only one event each, namely: INSTRUCTIONS_RETIRED, UNHALTED_CORE_CYCLES, UNHALTED_REFERENCE_CYCLES.


2. Command line options

The Intel Atom processor specific options of pfmon are as follows:

--counter-mask=msk1,msk2,...set event counter mask (0,1,2,3)
--inv-mask=i1,i2,...set event inverse counter mask (y/n,0/1)
--edge-mask=e1,e2,...set event edge detect (y/n,0/1)
--anythr-mask=e1,e2,...set any thread filter (y/n,0/1)
--smpl-module=pebsuse the kernel PEBS custom sampling format

3. Event counter mask

Each counter supports a threshold value below which the occurrences of an event are not counted. If the threshold is set to n, then the counter is incremented by 1 only if there are more than n occurrences of the event per cycle. So effectively, the counter counts qualifying cycles.

Pfmon supports this threshold mechanism with the --counter-mask option which is used as follows:

   $ pfmon -euops_retired:any  ls /dev/null
   /dev/null
   2288329 UOPS_RETIRED:ANY

This counts the total number of micro-operations dispatched for the command ls while executing at the user level. Now we want to count the number of cycles where two or more micro-operations are retired:

   $ pfmon -euops_retired:any  --counter-mask=2 ls /dev/null
   /dev/null
   583666 UOPS_RETIRED:ANY

We can push it further:

   % pfmon -euops_retired:any  --counter-mask=4 ls /dev/null
   /dev/null
   0 UOPS_RETIRED:ANY

When this option is not specified the counter mask is set to zero, i.e., each occurrence increments the counter. The threshold is -bit wide. When this option is not present, the threshold is set to zero.

It is possible to set the threshold for multiple events. Yet only generic counters do support thresholds. Here is an example:

   $ pfmon -euops_retired:any,mem_load_retired:L2_MISS --counter-mask=2,2 ls /dev/null
   /dev/null
   580877 UOPS_RETIRED:ANY
        0 MEM_LOAD_RETIRED:L2_MISS

This commands counts the number of cycles in which 2 or more micro-operations were retired and the number of cycles in which there were 2 or more L2 data cache misses.

This threshold can be can be inverted, i.e., from greater or equal (>=) to less than (<) using the --inv-mask option.

4. Event inverse mask

It is possible to invert what is measured using the --inv option. This option only makes real sense when combined with the threshold of --counter-mask. In that case it simply inverse the filtering from >= to < . The option takes a boolean value 0/n (false, the default) or 1/y (true). Inversion may be set for multiple events using a comma separated list, e.g., --inv=0,n,1.

The following commands, for instance, counts the number of cycles in which less than one micro-ops was retired, i.e., no micro-ops were retired, for the ls command:

   $  pfmon -euops_retired:any  --counter-mask=1 --inv=1  ls /dev/null
   /dev/null
   1412913 UOPS_RETIRED:ANY
5. Event edge mask

Edge detection, instead of default level detection, can be enabled, per-event, with the --edge-mask. The true/false value can be expressed with either 0/1 or y/n or any combination thereof.

6. Any thread filter

The any thread filter allows measurement to span the two logical hyperthreads. By default, events are only measured for the current hyperthread. In system-wide mode, that means only on the current CPU. The --anythr option is used to enable monitoring events in both hyperthreads at the same time. In sstem-wide mode, it means measuring occurrences in both CPUs.

In the following example, we run a busy loop program on CPU1. Then on CPU0, we measure the ls command. We pin the command to CPU0. First, the run without the any thread filter:

   $  pfmon --pin-command  -euops_retired:any  --system-wide --cpu-list=0 --anythr=0 ls /dev/null
   /dev/null
   CPU0    2793729 UOPS_RETIRED:ANY

Next, the same run with the any thred filter enabled:

   $ pfmon --pin-command  -euops_retired:any  --system-wide --cpu-list=0 --anythr=1 ls /dev/null
   /dev/null
   CPU0    5222722 UOPS_RETIRED:ANY
5. Precise Event-Based Sampling (PEBS)

To make use of this feature, you must have kernel perfmon2 v2.81 or higher.

PEBS is an advanced sampling feature of the Intel Core-based processors in which the processor is directly recording samples into a designated memory region. The Intel Atom PEBS support is identical to the Core PEBS.

The perfmon2 interface supports this type of sampling via a custom sampling format. For the Intel Atom processor, the perfmon_pebs_core_smpl module must be loaded into the kernel, otherwise pfmon will fail. You can verify this by checking out /sys/kernel/perfmon/formats.

With PEBS, the format of the samples is mandated by the processor. Each samples contains the machine state of the processor at the time the counter overflowed. The precision of PEBS comes from the fact that the instruction pointer recorded in each sample is at most one instruction away from where the counter actually overflowed. The skid is mimized compared to regular interrupted instruction pointer. Another key advantage of PEBS is that is minimizes the overhead because the Linux kernel is only involved when the PEBS buffer fills up, i.e., there is no interrupt until a lot of samples are available.

A constraint of PEBS is that it works only with certain events, and there is no flexibility in what is recorded in each sample. For instance, in system-wide the process identification is not recorded. Pfmon will reject any event that cannot work with PEBS. It is possible to know which events support PEBS by using the pfmon -i command. Let us say, you want to sample on INSTRUCTIONS_RETIRED:

   $  pfmon -iinstructions_retired
   Name     : INSTRUCTIONS_RETIRED
   Code     : 0xc0
   Counters : [ 0 1 16 ]
   Desc     : Instructions retired
   PEBS     : Yes

Notice the last field (PEBS). It is set to yes. Only one counter supports PEBS, therefore only one event can be sampled. Event options such as inversion or threshold are supported. For instance, to sample on the number of cycles in which no instructions is retired (stalls):

 %  pfmon --smpl-module=pebs -einstructions_retired --inv=1 --counter-mask=1 --long-smpl-periods=2660000 -uk -- foo
 # counts   %self    %cum          code addr
     1976  70.67%  70.67% 0x00000000004005e9 
      710  25.39%  96.07% 0x00000000004005f1 
       61   2.18%  98.25% 0x00000000004005e0 
       21   0.75%  99.00% 0x00000000004005ed 
       10   0.36%  99.36% 0x00000000004005f4 
        2   0.07%  99.43% 0xffffffff80583110 
        2   0.07%  99.50% 0xffffffff8024b4f8 
        ...

By construction PEBS does reset the counter after each sample. Thus software randomization of the sample period does not work, except to reset the period after each PEBS buffer overflow. Consequently, the --smpl-periods-random will not have quite the same effect as with regalar sampling

6. References

Further documentation of performance monitoring for the Intel Atom processor is available in the IA-32 architecture manual vol3b.