Compressing and decompressing a tar file. Seven different tests below with slightly different profiles in the testing below, useful to later separate these out. Interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.

AMD metrics show a multi-threaded code but only 4.17 cores and not much I/O, also a smaller amount of floating point code and otherwise memory-bound program. The profile above shows several workloads are more memory-bound than others.

elapsed              2901.236
on_cpu               0.261          # 4.17 / 16 cores
utime                12051.991
stime                42.412
nvcsw                761019         # 85.50%
nivcsw               129106         # 14.50%
inblock              416416         # 143.53/sec
onblock              8296           # 2.86/sec
cpu-clock            12088006176826 # 12088.006 seconds
task-clock           12089423964530 # 12089.424 seconds
page faults          10563887       # 873.812/sec
context switches     904309         # 74.802/sec
cpu migrations       73731          # 6.099/sec
major page faults    19             # 0.002/sec
minor page faults    10563868       # 873.811/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             8500567397945  # 123.737 branches per 1000 inst
branch misses        235070446072   # 2.77% branch miss
conditional          7691845876170  # 111.965 conditional branches per 1000 inst
indirect             2728510194     # 0.040 indirect branches per 1000 inst
cpu-cycles           64399572067291 # 1.03 GHz
instructions         88389737628447 # 1.37 IPC
slots                128795715843420 #
retiring             28442626656254 # 22.1% (24.1%)
-- ucode             320626280      #     0.0%
-- fastpath          28442306029974 #    22.1%
frontend             11861452414971 #  9.2% (10.1%)
-- latency           7369823434362  #     5.7%
-- bandwidth         4491628980609  #     3.5%
backend              68935929188030 # 53.5% (58.5%)
-- cpu               4938287659500  #     3.8%
-- memory            63997641528530 #    49.7%
speculation          8580707704693  #  6.7% ( 7.3%)
-- branch mispredict 8456809584640  #     6.6%
-- pipeline restart  123898120053   #     0.1%
smt-contention       10974862393946 #  8.5% ( 0.0%)
cpu-cycles           55104213476379 # 1.18 GHz
instructions         71191798572066 # 1.29 IPC
instructions         23728504766934 # 34.177 l2 access per 1000 inst
l2 hit from l1       639125597952   # 39.52% l2 miss
l2 miss from l1      221137265226   #
l2 hit from l2 pf    72461497838    #
l3 hit from l2 pf    56428551426    #
l3 miss from l2 pf   42946931535    #
instructions         23721123454337 # 22.889 float per 1000 inst
float 512            135            # 0.000 AVX-512 per 1000 inst
float 256            274            # 0.000 AVX-256 per 1000 inst
float 128            542946205318   # 22.889 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              1535.496
on_cpu               0.260          # 4.16 / 16 cores
utime                6370.957
stime                15.872
nvcsw                238930         # 79.50%
nivcsw               61627          # 20.50%
inblock              417560         # 271.94/sec
onblock              3752           # 2.44/sec
cpu-clock            6383952839045  # 6383.953 seconds
task-clock           6384328182880  # 6384.328 seconds
page faults          6088624        # 953.683/sec
context switches     307983         # 48.240/sec
cpu migrations       80353          # 12.586/sec
major page faults    22             # 0.003/sec
minor page faults    6088602        # 953.679/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             3627813375966  # 127.466 branches per 1000 inst
branch misses        100424353464   # 2.77% branch miss
conditional          3627813400830  # 127.466 conditional branches per 1000 inst
indirect             460528146902   # 16.181 indirect branches per 1000 inst
slots                69134049635162 #
retiring             21718349215816 # 31.4% (31.4%)
-- ucode             670291785478   #     1.0%
-- fastpath          21048057430338 #    30.4%
frontend             4655453627152  #  6.7% ( 6.7%)
-- latency           2225063048502  #     3.2%
-- bandwidth         2430390578650  #     3.5%
backend              31176250844073 # 45.1% (45.1%)
-- cpu               9246534763123  #    13.4%
-- memory            21929716080950 #    31.7%
speculation          11778658018445 # 17.0% (17.0%)
-- branch mispredict 11694875066244 #    16.9%
-- pipeline restart  83782952201    #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           22497766397352 # 0.92 GHz
instructions         37035269756825 # 1.65 IPC
l2 access            554900697586   # 26.405 l2 access per 1000 inst
l2 miss              245652286024   # 44.27% l2 miss

Process tree structure. Overall we have longer runtime than with other compression tests.

989 processes
	561 zstd                 145543.77   430.84
	 64 clinfo                  10.88     5.44
	 38 vulkaninfo               1.14     0.76
	  6 php                      0.15     0.56
	  4 vulkani:disk$0           0.12     0.08
	  6 glxinfo:gdrv0            0.09     0.09
	  2 llvmpipe-0               0.06     0.04
	  2 llvmpipe-1               0.06     0.04
	  2 llvmpipe-10              0.06     0.04
	  2 llvmpipe-11              0.06     0.04
	  2 llvmpipe-12              0.06     0.04
	  2 llvmpipe-13              0.06     0.04
	  2 llvmpipe-14              0.06     0.04
	  2 llvmpipe-15              0.06     0.04
	  2 llvmpipe-2               0.06     0.04
	  2 llvmpipe-3               0.06     0.04
	  2 llvmpipe-4               0.06     0.04
	  2 llvmpipe-5               0.06     0.04
	  2 llvmpipe-6               0.06     0.04
	  2 llvmpipe-7               0.06     0.04
	  2 llvmpipe-8               0.06     0.04
	  2 llvmpipe-9               0.06     0.04
	  2 glxinfo                  0.05     0.04
	  2 glxinfo:cs0              0.05     0.03
	  2 glxinfo:disk$0           0.05     0.03
	  2 glxinfo:sh0              0.05     0.03
	  2 glxinfo:shlo0            0.05     0.03
	  6 clang                    0.04     0.03
	  1 lspci                    0.01     0.03
	101 sh                       0.00     0.00
	 34 sed                      0.00     0.00
	 33 compress-zstd            0.00     0.00
	 13 gcc                      0.00     0.00
	 10 gsettings                0.00     0.00
	  9 stty                     0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Core computation blocks look as follows show we start one process per core.

      43700) compress-zstd start=71.84 finish=134.40
        43701) zstd start=71.84 finish=134.40
          43702) zstd start=72.47 finish=134.35
          43703) zstd start=72.47 finish=134.35
          43704) zstd start=72.47 finish=134.35
          43705) zstd start=72.47 finish=134.35
          43706) zstd start=72.47 finish=134.35
          43707) zstd start=72.47 finish=134.35
          43708) zstd start=72.47 finish=134.35
          43709) zstd start=72.47 finish=134.34
          43710) zstd start=72.47 finish=134.35
          43711) zstd start=72.47 finish=134.35
          43712) zstd start=72.47 finish=134.35
          43713) zstd start=72.47 finish=134.35
          43714) zstd start=72.47 finish=134.35
          43715) zstd start=72.47 finish=134.34
          43716) zstd start=72.47 finish=134.34
          43717) zstd start=72.47 finish=134.34
        43718) sed start=134.40 finish=134.40