NAME
libpfm_itanium2 - support for Itanium2 specific PMU features
SYNOPSIS
#include <perfmon/pfmlib.h>
#include <perfmon/pfmlib_itanium2.h>
int pfm_ita2_is_ear(unsigned int i);
int pfm_ita2_is_dear(unsigned int i);
int pfm_ita2_is_dear_tlb(unsigned int i);
int pfm_ita2_is_dear_cache(unsigned int i);
int pfm_ita2_is_dear_alat(unsigned int i);
int pfm_ita2_is_iear(unsigned int i);
int pfm_ita2_is_iear_tlb(unsigned int i);
int pfm_ita2_is_iear_cache(unsigned int i);
int pfm_ita2_is_btb(unsigned int i);
int pfm_ita2_support_opcm(unsigned int i);
int pfm_ita2_support_iarr(unsigned int i);
int pfm_ita2_support_darr(unsigned int i);
int pfm_ita2_get_event_maxincr(unsigned int i, unsigned int *maxincr);
int pfm_ita2_get_event_umask(unsigned int i, unsigned long *umask);
int pfm_ita2_get_event_group(unsigned int i, int *grp);
int pfm_ita2_get_event_set(unsigned int i, int *set);
int pfm_ita2_get_ear_mode(unsigned int i, pfmlib_ita2_ear_mode_t *mode);
int pfm_ita2_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out);
DESCRIPTION
The libpfm library provides full support for all the Itanium 2 specific features
of the PMU. The interface is defined in pfmlib_itanium2.h. It consists
of a set of functions and structures which describe and allow access to the
Itanium 2 specific PMU features.
The Itanium 2 specific functions presented here are mostly used to retrieve
the characteristics of an event. Given a opaque event descriptor, obtained
by pfm_find_event or its derivatives, they return a boolean value
indicating whether this event support this feature or is of a particular
kind.
The pfm_ita2_is_ear() function returns 1 if the event
designated by i corresponds to a EAR event, i.e., an Event Address Register
type of events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is an ear event, but
CPU_CYCLES is not. It can be a data or instruction EAR event.
The pfm_ita2_is_dear() function returns 1 if the event
designated by i corresponds to an Data EAR event. Otherwise 0 is returned.
It can be a cache or TLB EAR event.
The pfm_ita2_is_dear_tlb() function returns 1 if the event
designated by i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
The pfm_ita2_is_dear_cache() function returns 1 if the event
designated by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
The pfm_ita2_is_dear_alat() function returns 1 if the event
designated by i corresponds to a ALAT EAR cache event. Otherwise 0 is returned.
The pfm_ita2_is_iear() function returns 1 if the event
designated by i corresponds to an instruction EAR event. Otherwise 0 is returned.
It can be a cache or TLB instruction EAR event.
The pfm_ita2_is_iear_tlb() function returns 1 if the event
designated by i corresponds to an instruction EAR TLB event. Otherwise 0 is returned.
The pfm_ita2_is_iear_cache() function returns 1 if the event
designated by i corresponds to an instruction EAR cache event. Otherwise 0 is returned.
The pfm_ita2_support_opcm() function returns 1 if the event
designated by i supports opcode matching, i.e., can this event be measured accurately
when opcode matching via PMC8/PMC9 is active. Not all events supports this feature.
The pfm_ita2_support_iarr() function returns 1 if the event
designated by i supports code address range restrictions, i.e., can this event be measured accurately when
code range restriction is active. Otherwise 0 is returned. Not all events supports this feature.
The pfm_ita2_support_darr() function returns 1 if the event
designated by i supports data address range restrictions, i.e., can this event be measured accurately when
data range restriction is active. Otherwise 0 is returned. Not all events supports this feature.
The pfm_ita2_get_event_maxincr() function returns in maxincr the maximum number of
occurrences per cycle for the event designated by i. Certain Itanium 2 events can occur more than
once per cycle. When an event occurs more than once per cycle, the PMD counter will be incremented accordingly.
It is possible to restrict measurement when event occur more than once per cycle. For instance,
NOPS_RETIRED can happen up to 6 times/cycle which means that the threshold can be adjusted between 0 and 5,
where 5 would mean that the PMD counter would be incremented by 1 only when the nop instruction is executed more
than 5 times/cycle. This function returns the maximum number of occurrences of the event per cycle, and
is the non-inclusive upper bound for the threshold to program in the PMC register.
The pfm_ita2_get_event_umask() function returns in umask the umask for the event
designated by i.
The pfm_ita2_get_event_grp() function returns in grp the group to which the
event designated by i belongs. The notion of group is used for L1 and L2 cache events only.
For all other events, a group is irrelevant and can be ignored. If the event is an L2
cache event then the value of grp will be PFMLIB_ITA2_EVT_L2_CACHE_GRP. Similarly,
if the event is an L1 cache event, the value of grp will be PFMLIB_ITA2_EVT_L1_CACHE_GRP.
In any other cases, the value of grp will be PFMLIB_ITA2_EVT_NO_GRP.
The pfm_ita2_get_event_set() function returns in set the set to which the
event designated by i belongs. A set is a subdivision of a group and is therefore
only relevant for L1 and L2 cache events. An event can only belong to one group and
one set. This partitioning of the cache events is due to some hardware limitations which
impose some restrictions on events. For a given group, events from different sets
cannot be measured at the same time. If the event does not belong to a group
then the value of set is PFMLIB_MONT_EVT_NO_SET.
The pfm_ita2_irange_is_fine function returns 1 if the configuration description passed
in outp, the generic output parameters and mod_out, the Itanium2 specific output parameters,
use code range restriction in fine mode. Otherwise the function returns 0. This function can only be
called after a call pfm_dispatch_events() which returned successfully and had the data
structures pointed to by outp and mod_out as output parameters.
The pfm_ita2_get_event_ear_mode() function returns in mode the EAR mode of the
event designated by i. If the event is not an EAR event, then PFMLIB_ERR_INVAL
is returned and mode is not updated. Otherwise mode can have the following values:
- PFMLIB_ITA2_EAR_TLB_MODE: The event is an EAR TLB mode. It can be either data or instruction TLB EAR.
- PFMLIB_ITA2_EAR_CACHE_MODE: The event is a cache EAR. It can be either data or instruction cache EAR.
- PFMLIB_ITA2_EAR_ALAT_MODE: The event is an ALAT EAR. It can only be a data EAR event.
When the Itanium 2 specific features are needed to support a measurement their descriptions must be passed
as model-specific input arguments to the pfm_dispatch_events call. The Itanium 2 specific
input arguments are described in the pfmlib_ita2_input_param_t structure and the output
parameters in pfmlib_ita2_output_param_t. They are defined as follows:
typedef enum {
PFMLIB_ITA2_ISM_BOTH=0,
PFMLIB_ITA2_ISM_IA32=1,
PFMLIB_ITA2_ISM_IA64=2
} pfmlib_ita2_ism_t;
typedef struct {
unsigned int flags;
unsigned int thres;
pfmlib_ita2_ism_t ism;
} pfmlib_ita2_counter_t;
typedef struct {
unsigned char opcm_used;
unsigned long pmc_val;
} pfmlib_ita2_opcm_t;
typedef struct {
unsigned char btb_used;
unsigned char btb_ds;
unsigned char btb_tm;
unsigned char btb_ptm;
unsigned char btb_ppm;
unsigned char btb_brt;
unsigned int btb_plm;
} pfmlib_ita2_btb_t;
typedef enum {
PFMLIB_ITA2_EAR_CACHE_MODE= 0,
PFMLIB_ITA2_EAR_TLB_MODE = 1,
PFMLIB_ITA2_EAR_ALAT_MODE = 2
} pfmlib_ita2_ear_mode_t;
typedef struct {
unsigned char ear_used;
pfmlib_ita2_ear_mode_t ear_mode;
pfmlib_ita2_ism_t ear_ism;
unsigned int ear_plm;
unsigned long ear_umask;
} pfmlib_ita2_ear_t;
typedef struct {
unsigned int rr_plm;
unsigned long rr_start;
unsigned long rr_end;
} pfmlib_ita2_input_rr_desc_t;
typedef struct {
unsigned long rr_soff;
unsigned long rr_eoff;
} pfmlib_ita2_output_rr_desc_t;
typedef struct {
unsigned int rr_flags;
pfmlib_ita2_input_rr_desc_t rr_limits[4];
unsigned char rr_used;
} pfmlib_ita2_input_rr_t;
typedef struct {
unsigned int rr_nbr_used;
pfmlib_ita2_output_rr_desc_t rr_infos[4];
pfmlib_reg_t rr_br[8];
} pfmlib_ita2_output_rr_t;
typedef struct {
pfmlib_ita2_counter_t pfp_ita2_counters[PMU_ITA2_NUM_COUNTERS];
unsigned long pfp_ita2_flags;
pfmlib_ita2_opcm_t pfp_ita2_pmc8;
pfmlib_ita2_opcm_t pfp_ita2_pmc9;
pfmlib_ita2_ear_t pfp_ita2_iear;
pfmlib_ita2_ear_t pfp_ita2_dear;
pfmlib_ita2_btb_t pfp_ita2_btb;
pfmlib_ita2_input_rr_t pfp_ita2_drange;
pfmlib_ita2_input_rr_t pfp_ita2_irange;
} pfmlib_ita2_input_param_t;
typedef struct {
pfmlib_ita2_output_rr_t pfp_ita2_drange;
pfmlib_ita2_output_rr_t pfp_ita2_irange;
} pfmlib_ita2_output_param_t;
PER-EVENT OPTIONS
The Itanium 2 processor provides two additional per-event features for
counters: thresholding and instruction set selection. They can be set using the
pfp_ita2_counters data structure for each event. The ism
field can be initialized as follows:
- PFMLIB_ITA2_ISM_BOTH : The event will be monitored during IA-64 and IA-32 execution
- PFMLIB_ITA2_ISM_IA32 : The event will only be monitored during IA-32 execution
- PFMLIB_ITA2_ISM_IA64 : The event will only be monitored during IA-64 execution
If ism has a value of zero, it will default to PFMLIB_ITA2_ISM_BOTH.
The thres indicates the threshold for the event. A threshold of n means
that the counter will be incremented by one only when the event occurs more than n
times per cycle.
The flags field contains event-specific flags. The currently defined flags are:
- PFMLIB_ITA2_FL_EVT_NO_QUALCHECK:
When this flag is set it indicates that the library should ignore the qualifiers constraints
for this event. Qualifiers includes opcode matching, code and data range restrictions. When an
event is marked as not supporting a particular qualifier, it usually means that it is ignored, i.e.,
the extra level of filtering is ignored. For instance, the CPU_CYCLES event does not support code
range restrictions and by default the library will refuse to program it if range restriction is also
requested. Using the flag will override the check and the call to pfm_dispatch_events will succeed.
In this case, CPU_CYCLES will be measured for the entire program and not just for the code range requested.
For certain measurements this is perfectly acceptable as the range restriction will only be applied relevant
to events which support it. Make sure you understand which events do not support certain qualifiers before
using this flag.
OPCODE MATCHING
The pfp_ita2_pmc8 and pfp_ita2_pmc9 fields of type pfmlib_ita2_opcm_t contain
the description of what to do with the opcode matchers. Itanium 2 supports opcode matching via
PMC8 and PMC9. When this feature is used the opcm_used field must be set to 1, otherwise
it is ignored by the library. The pmc_val simply contains the raw value to store in
PMC8 or PMC9. The library may adjust the value to enable/disable some options depending on the set
of features being used. The final value for PMC8 and PMC9 will be stored in the pfp_pmcs
table of the generic output parameters.
EVENT ADDRESS REGISTERS
The pfp_ita2_iear field of type pfmlib_ita2_ear_t describes what to do with instruction
Event Address Registers (I-EARs). Again if this feature is used the ear_used must be set to 1,
otherwise it will be ignored by the library. The ear_mode must be set to either one of
PFMLIB_ITA2_EAR_TLB_MODE, PFMLIB_ITA2_EAR_CACHE_MODEto indicate the type of EAR to program.
The umask to store into PMC10 must be in ear_umask. The privilege level mask at which the I-EAR will be
monitored must be set in ear_plm which can be any combination of PFM_PLM0, PFM_PLM1,
PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the default privilege level mask in pfp_dfl_plm is used.
Finally the instruction set for which to monitor is in ear_ism and can be any one of
PFMLIB_ITA2_ISM_BOTH, PFMLIB_ITA2_ISM_IA32, or PFMLIB_ITA2_ISM_IA64.
The pfp_ita2_dear field of type pfmlib_ita2_ear_t describes what to do with data Event Address
Registers (D-EARs). The description is identical to the I-EARs except that it applies to PMC11 and
that a ear_mode of PFMLIB_ITA2_EAR_ALAT_MODE is possible.
In general, there are four different methods to program the EAR (data or instruction):
- Method 1 : There is an EAR event in the list of events to monitor and ear_used is cleared. In this
case the EAR will be programmed (PMC10 or PMC11) based on the information encoded in the event.
A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or L1I_EAR_EVENTS
depending on the type of EAR.
- Method 2 : There is an EAR event in the list of events to monitor and ear_used is set. In this
case the EAR will be programmed (PMC10 or PMC11) using the information in the pfp_ita2_iear or
pfp_ita2_dear structure because it contains more detailed information, such as privilege level and
instruction set. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or
L1I_EAR_EVENTS depending on the type of EAR.
- Method 3 : There is no EAR event in the list of events to monitor and and ear_used is cleared. In this case
no EAR is programmed.
- Method 4 : There is no EAR event in the list of events to monitor and and ear_used is set. In this case
case the EAR will be programmed (PMC10 or PMC11) using the information in the pfp_ita2_iear or
pfp_ita2_dear structure. This is the free running mode for the EAR.
BRANCH TRACE BUFFER
The pfp_ita2_btb of type pfmlib_ita2_btb_t field is used to configure the Branch Trace Buffer (BTB). If the
btb_used is set, then the library will take the configuration into account, otherwise any BTB configuration will be ignored.
The various fields in this structure provide means to filter out the kind of branches that gets recorded in the BTB.
Each one represents an element of the branch architecture of the Itanium 2 processor. Refer to the Itanium 2 specific
documentation for more details on the branch architecture. The fields are as follows:
- btb_ds: If the value of this field is 1, then detailed information about the branch prediction are recorded in place of information about the target
address. If the value is 0, then information about the target address of the branch is recorded instead.
- btb_tm: If this field is 0, then no branch is captured. If this field is 1, then non taken branches are captured. If this field is 2, then
taken branches are captured. Finally if this field is 3 then all branches are captured.
- btb_ptm: If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted target address are captured. If this field
is 2, then branches with correctly predicted target address are captured. Finally if this field is 3 then all branches are captured regardless of
target address prediction.
- btb_ppm: If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted path (taken/non taken) are captured. If this field
is 2, then branches with correctly predicted path are captured. Finally if this field is 3 then all branches are captured regardless of
their path prediction.
- btb_brt: If this field is 0, then no branch is captured. If this field is 1, then only IP-relative branches are captured. If this field
is 2, then only return branches are captured. Finally if this field is 3 then only non-return indirect branches are captured.
- btb_plm: This is the privilege level mask at which the BTB captures branches. It can be any combination of PFM_PLM0, PFM_PLM1, PFM_PLM2,
PFM_PLM3. If btb_plm is 0 then the default privilege level mask in pfp_dfl_plm is used.
There are 4 methods to program the BTB and they are as follows:
- Method 1: The BRANCH_EVENT is in the list of event to monitor and btb_used is cleared. In this case,
the BTB will be configured (PMC12) to record ALL branches. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count BRANCH_EVENT.
- Method 2: The BRANCH_EVENT is in the list of events to monitor and btb_used is set. In this case,
the BTB will be configured (PMC12) using the information in the pfp_ita2_btb structure. A counting monitor
(PMC4/PMD4-PMC7/PMD7) will be programmed to count BRANCH_EVENT.
- Method 3: The BRANCH_EVENT is not in the list of events to monitor and btb_used is set. In this case,
the BTB will be configured (PMC12) using the information in the pfp_ita2_btb structure. This is the
free running mode for the BTB.
- Method 4: The BRANCH_EVENT is not in the list of events to monitor and btb_used is cleared. In this case,
the BTB is not programmed.
DATA AND CODE RANGE RESTRICTIONS
The pfp_ita2_drange and pfp_ita2_irange fields control the range restrictions for the data and
code respectively. The idea is that the application passes a set of ranges, each designated by a start
and end address. Upon return from pfm_dispatch_events(), the application gets back the set of
registers and their values that needs to be programmed via a kernel interface.
Range restriction is implemented using the debug registers. There is a limited number of debug registers and they go in pair. With
8 data debug registers, a maximum of 4 distinct ranges can be specified. The same applies to code range restrictions. Moreover, there
are some severe constraints on the alignment and size of the ranges. Given that the size of a range is specified using a bitmask, there can
be situations where the actual range is larger than the requested range. For code ranges, the Itanium 2 processor can use what is called a fine mode,
where a range is designated using two pairs of code debug registers. In this mode, the bitmask is not used, the start and end
addresses are directly specified. Not all code ranges qualify for fine mode, the size of the range must be 4KB or less and the range
cannot cross a 4KB page boundary. The library will make a best effort in choosing the right mode for each range. For code ranges,
it will try the fine mode first and will default to using the bitmask mode otherwise. Fine mode applies to all code debug
registers or none, i.e., you cannot have a range using fine mode and another using the bitmask. the Itanium 2 processor somehow limits the use
of multiple pairs to accurately cover a code range. This can only be done for IA64_INST_RETIRED and even then, you need several
events to collect the counts. For all other events, only one pair can be used, which leads to more inaccuracy due to
approximation. Data ranges can used multiple debug register pairs to gain more accuracy. The library will never cover less than what is
requested. The algorithm will use more than one pair of debug registers
whenever possible to get a more precise range. Hence, up to the 4 pairs can be used to describe a single range.
If range restriction is to be used, the rr_used field must be set to one, otherwise settings will be ignored.
The ranges are described by the pfmlib_ita2_input_rr_t structure. Up to 4 ranges can be defined. Each
range is described in by a entry in rr_limits. Some flags for all ranges can be defined in rr_flags.
Currently defined flags are:
- PFMLIB_ITA2_RR_INV: Inverse the code ranges. The qualifying events will be measurement when executing outside the specified
ranges.
- PFMLIB_ITA2_RR_NO_FINE_MODE: Force non fine mode for all code ranges (mostly for debug)
The pfmlib_ita2_input_rr_desc_t structure is defined as follows:
- rr_plm: The privilege level at which the range is active. It can be any combinations of
PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If btb_plm is 0 then the
default privilege level mask in pfp_dfl_plm is used. The privilege level is only relevant
for code ranges, data ranges ignores the setting.
- rr_start: This is the start address of the range. Any address is supported but for code range it
must be bundle aligned, i.e., 16-byte aligned.
- rr_end This is the end address of the range. Any address is supported but for code range it
must be bundle aligned, i.e., 16-byte aligned.
The library will provide the values for the debug registers as well as some information
about the actual ranges in the output parameters and more precisely in the pfmlib_ita2_output_rr_t
structure for each range. The structure is defined as follows:
- rr_nbr_used: Contains the number of debug registers used to cover the range. This is necessarily an even number
as debug registers always go in pair. The value of this field is between 0 and 7.
- rr_br: This table contains the list of debug registers necessary to cover the ranges. Each element is
of type pfmlib_reg_t. The reg_num field contains the debug register index while
reg_value contains the debug register value. Both the index and value must be copied
into the kernel specific argument to program the debug registers. The library never programs them.
- rr_infos: Contains information about the ranges defined. Because of alignment restrictions, the actual range
covered by the debug registers may be larger than the requested range. This table describe the differences
between the requested and actual ranges expressed as offsets:
- rr_soff: Contains the start offset of the actual range described by the debug registers. If zero, it means
the library was able to match exactly the beginning of the range. Otherwise it represents the number
of byte by which the actual range precedes the requested range.
- rr_eoff: Contains the end offset of the actual range described by the debug registers. If zero, it means
the library was able to match exactly the end of the range. Otherwise it represents the number of
bytes by which the actual range exceeds the requested range.
ERRORS
Refer to the description of pfm_dispatch_events() for errors when using the Itanium 2
specific input and output arguments.
SEE ALSO
pfm_dispatch_events() and set of examples shipped with the library
AUTHOR
Stephane Eranian
This document was created by man2html, using the manual pages.
Time: 16:57:22 GMT, October 27, 2007
|