By default, counters were not available to measure l3 and df counters on AMD. With some help from likwid documentation I figured out what is going on and how to get it enabled.
The first thing to do is see if the perf subsystem knows about l3 and df areas. This can be done by doing
prompt% ls /sys/devices/*/format
/sys/devices/amd_iommu_0/format:
csource devid devid_mask domid domid_mask pasid pasid_mask
/sys/devices/cpu/format:
cmask edge event inv umask
/sys/devices/ibs_fetch/format:
l3missonly rand_en
/sys/devices/ibs_op/format:
cnt_ctl l3missonly
/sys/devices/kprobe/format:
retprobe
/sys/devices/msr/format:
event
/sys/devices/power/format:
event
/sys/devices/uprobe/format:
ref_ctr_offset retprobe
Only devices that are available will show up here. My example is missing, so next one needs to see what is compiled into the running kernel. This can be done by doing:
prompt% /home/mev/source/wspy# grep -i perf_events /boot/config-$(uname -r)
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_GUEST_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
# CONFIG_PERF_EVENTS_AMD_POWER is not set
CONFIG_PERF_EVENTS_AMD_UNCORE=m
CONFIG_PERF_EVENTS_AMD_BRS=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
The l3 and df counters are uncore counters and can be loaded as a module. So we load this module using the following command
prompt% /home/mev/source/wspy# insmod /lib/modules/$(uname -r)/kernel/arch/x86/events/amd/amd-uncore.ko
This loads the module and the command above shows /sys/devices/amd_l3/format and /sys/devices/amd_df/format. Once this is enabled, the perf list command can give relevant counters. The command and useful parts of the output are included below:
prompt% perf list -v --detail
l3_cache:
l3_cache_accesses
[l3_lookup_state.all_coherent_accesses_to_l3]
l3_misses
[l3_lookup_state.l3_miss]
l3_read_miss_latency
[l3_xi_sampled_latency.all * 10 / l3_xi_sampled_latency_requests.all]
Now using “perf stat” we can try the l3 counters and make sure they work.
prompt% perf stat -e l3_lookup_state.all_coherent_accesses_to_l3,l3_lookup_state.l3_hit /bin/ls
cpumask format perf_event_mux_interval_ms power subsystem type uevent
Performance counter stats for 'system wide':
80,264 l3_lookup_state.all_coherent_accesses_to_l3
70,798 l3_lookup_state.l3_hit
0.001688959 seconds time elapsed
What remains is figuring out the right “config” flags to make the equivalent call to perf_event_open. We can look these up with strace. This tells me the type field for the struct perf_event_attr is 0xe. This also happens to be shown in /sys/devices/amd_l3/type file. I can figure this out for l3 access but not quite sure which event to use for the data fabric to measure memory.
Success!
