A fast version of ericson texture compression. This has four workloads. The first two look multi-threaded and the last two single-threaded.

Topdown profile shows an overall higher retirement rate with some backend stalls. Branch misses look higher than average.

AMD metrics show floating point code, low levels of L2 access. Backend stalls are more CPU than memory.

elapsed              379.185
on_cpu               0.127          # 2.03 / 16 cores
utime                695.160
stime                73.650
nvcsw                33939          # 79.16%
nivcsw               8933           # 20.84%
inblock              0              # 0.00/sec
onblock              13072          # 34.47/sec
cpu-clock            768605623724   # 768.606 seconds
task-clock           768683156709   # 768.683 seconds
page faults          40265127       # 52381.956/sec
context switches     44571          # 57.984/sec
cpu migrations       728            # 0.947/sec
major page faults    2              # 0.003/sec
minor page faults    40265125       # 52381.953/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             322067504448   # 53.327 branches per 1000 inst
branch misses        19026737996    # 5.91% branch miss
conditional          261678731073   # 43.328 conditional branches per 1000 inst
indirect             2411440948     # 0.399 indirect branches per 1000 inst
cpu-cycles           3143167306570  # 0.52 GHz
instructions         6014440828700  # 1.91 IPC
slots                6299593694976  #
retiring             2064474000665  # 32.8% (41.4%)
-- ucode             17942446201    #     0.3%
-- fastpath          2046531554464  #    32.5%
frontend             809225698829   # 12.8% (16.2%)
-- latency           547271642676   #     8.7%
-- bandwidth         261954056153   #     4.2%
backend              1774056180250  # 28.2% (35.6%)
-- cpu               1117262901849  #    17.7%
-- memory            656793278401   #    10.4%
speculation          339046574585   #  5.4% ( 6.8%)
-- branch mispredict 337884265343   #     5.4%
-- pipeline restart  1162309242     #     0.0%
smt-contention       1312785174486  # 20.8% ( 0.0%)
cpu-cycles           3136549829895  # 0.52 GHz
instructions         6011087700746  # 1.92 IPC
instructions         2007447939680  # 8.024 l2 access per 1000 inst
l2 hit from l1       11062739284    # 26.83% l2 miss
l2 miss from l1      486228979      #
l2 hit from l2 pf    1209907993     #
l3 hit from l2 pf    708610652      #
l3 miss from l2 pf   3126294311     #
instructions         2010826750680  # 267.182 float per 1000 inst
float 512            49             # 0.000 AVX-512 per 1000 inst
float 256            380            # 0.000 AVX-256 per 1000 inst
float 128            537257423693   # 267.182 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         6022694670998  #
opcache              927164558278   # 153.945 opcache per 1000 inst
opcache miss         56305763687    #  6.1% opcache miss rate
l1 dTLB miss         954245753      # 0.158 L1 dTLB per 1000 inst
l2 dTLB miss         206814759      # 0.034 L2 dTLB per 1000 inst
instructions         6022972249053  #
icache               110398259315   # 18.330 icache per 1000 inst
icache miss          9028214708     #  8.2% icache miss rate
l1 iTLB miss         9538498        # 0.002 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            23646          # 0.000 TLB flush per 1000 inst

Intel metrics

elapsed              475.332
on_cpu               0.130          # 2.09 / 16 cores
utime                939.204
stime                52.098
nvcsw                32300          # 71.25%
nivcsw               13036          # 28.75%
inblock              1136           # 2.39/sec
onblock              1736           # 3.65/sec
cpu-clock            990936944257   # 990.937 seconds
task-clock           990980198648   # 990.980 seconds
page faults          40260308       # 40626.753/sec
context switches     47515          # 47.947/sec
cpu migrations       5556           # 5.607/sec
major page faults    0              # 0.000/sec
minor page faults    40260308       # 40626.753/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             310052567082   # 49.041 branches per 1000 inst
branch misses        12993611831    # 4.19% branch miss
conditional          310052583722   # 49.041 conditional branches per 1000 inst
indirect             40474888197    # 6.402 indirect branches per 1000 inst
slots                11846758328900 #
retiring             5119661056273  # 43.2% (43.2%)
-- ucode             304518481434   #     2.6%
-- fastpath          4815142574839  #    40.6%
frontend             1266742842987  # 10.7% (10.7%)
-- latency           964654096534   #     8.1%
-- bandwidth         302088746453   #     2.5%
backend              3986359025539  # 33.6% (33.6%)
-- cpu               3603962998717  #    30.4%
-- memory            382396026822   #     3.2%
speculation          1471295695168  # 12.4% (12.4%) high
-- branch mispredict 1430113408581  #    12.1%
-- pipeline restart  41182286587    #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           3696784401081  # 0.49 GHz
instructions         7221215707281  # 1.95 IPC
l2 access            34425025841    # 6.898 l2 access per 1000 inst
l2 miss              17807291225    # 51.73% l2 miss
cpu-cycles           2550674883434  #  8.5% memory latency
load stalls          203370913214   #  6.1% l1 bound
l1 miss              46953318482    #  1.3% l2 bound
l2 miss              12616747498    #  0.2% l3 bound
l3 miss              8507747116     #  0.3% dram bound
store_stalls         14693094374    #  0.6% store bound

Process overview

480 processes
	 36 etcpak                 706.06    73.79
	 68 clinfo                  16.86     5.66
	 38 vulkaninfo               1.14     1.33
	  4 vulkani:disk$0           0.12     0.14
	  6 php                      0.09     0.13
	  6 glxinfo:gdrv0            0.08     0.09
	  6 glxinfo:gl0              0.07     0.09
	  2 llvmpipe-0               0.06     0.07
	  2 llvmpipe-1               0.06     0.07
	  2 llvmpipe-10              0.06     0.07
	  2 llvmpipe-11              0.06     0.07
	  2 llvmpipe-12              0.06     0.07
	  2 llvmpipe-13              0.06     0.07
	  2 llvmpipe-14              0.06     0.07
	  2 llvmpipe-15              0.06     0.07
	  2 llvmpipe-2               0.06     0.07
	  2 llvmpipe-3               0.06     0.07
	  2 llvmpipe-4               0.06     0.07
	  2 llvmpipe-5               0.06     0.07
	  2 llvmpipe-6               0.06     0.07
	  2 llvmpipe-7               0.06     0.07
	  2 llvmpipe-8               0.06     0.07
	  2 llvmpipe-9               0.06     0.07
	  6 clang                    0.06     0.06
	  2 glxinfo                  0.06     0.03
	  2 glxinfo:cs0              0.06     0.03
	  2 glxinfo:disk$0           0.06     0.03
	  2 glxinfo:sh0              0.06     0.03
	  2 glxinfo:shlo0            0.06     0.03
	  3 rocminfo                 0.03     0.00
	  6 Worker 0                 0.00   459.87
	  6 Worker 1                 0.00   459.87
	  6 Worker 13                0.00   459.87
	  6 Worker 2                 0.00   459.87
	  6 Worker 3                 0.00   459.87
	  6 Worker 6                 0.00   459.87
	  6 Worker 7                 0.00   459.87
	  6 Worker 8                 0.00   459.87
	  6 Worker 9                 0.00   459.87
	  6 Worker 10                0.00   459.86
	  6 Worker 11                0.00   459.86
	  6 Worker 12                0.00   459.86
	  6 Worker 14                0.00   459.86
	  6 Worker 4                 0.00   459.86
	  6 Worker 5                 0.00   459.86
	  1 lspci                    0.00     0.03
	  1 ps                       0.00     0.01
	 88 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	  9 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks

      998072) etcpak           cpu=2 start=5.66  finish=13.05
        998073) etcpak           cpu=2 start=5.66  finish=13.05
          998074) etcpak           cpu=1 start=5.66  finish=6.72 
          998075) Worker 0         cpu=-1 start=6.72  finish=13.04
          998076) Worker 1         cpu=-1 start=6.73  finish=13.04
          998077) Worker 2         cpu=-1 start=6.73  finish=13.04
          998078) Worker 3         cpu=-1 start=6.73  finish=13.04
          998079) Worker 4         cpu=-1 start=6.73  finish=13.04
          998080) Worker 5         cpu=-1 start=6.73  finish=13.04
          998081) Worker 6         cpu=-1 start=6.73  finish=13.04
          998082) Worker 7         cpu=-1 start=6.73  finish=13.04
          998083) Worker 8         cpu=-1 start=6.73  finish=13.04
          998084) Worker 9         cpu=-1 start=6.73  finish=13.04
          998085) Worker 10        cpu=-1 start=6.73  finish=13.04
          998086) Worker 11        cpu=-1 start=6.73  finish=13.04
          998087) Worker 12        cpu=-1 start=6.73  finish=13.04
          998088) Worker 13        cpu=-1 start=6.73  finish=13.04
          998089) Worker 14        cpu=-1 start=6.73  finish=13.04