bork is a cross-platform file encryption utility written in Java. There is one workload which runs in seconds. Also looks like it is single-threaded

AMD topdown is sparse given how quickly it runs.

AMD metrics shows this is single-threaded and runs in less than a minute. Backend stalls are low and there is little floating point or L2 access.
elapsed 41.954
on_cpu 0.042 # 0.67 / 16 cores
utime 14.407
stime 13.551
nvcsw 3667 # 86.04%
nivcsw 595 # 13.96%
inblock 0 # 0.00/sec
onblock 16790088 # 400198.95/sec
cpu-clock 27966981919 # 27.967 seconds
task-clock 27971618112 # 27.972 seconds
page faults 165110 # 5902.769/sec
context switches 4280 # 153.012/sec
cpu migrations 319 # 11.404/sec
major page faults 2 # 0.072/sec
minor page faults 165103 # 5902.519/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 36500002265 # 128.750 branches per 1000 inst
branch misses 1369450142 # 3.75% branch miss
conditional 30093082710 # 106.150 conditional branches per 1000 inst
indirect 189413110 # 0.668 indirect branches per 1000 inst
cpu-cycles 121264096404 # 0.18 GHz
instructions 280489555903 # 2.31 IPC
slots 246520840950 #
retiring 108081861429 # 43.8% (43.8%)
-- ucode 208269074 # 0.1%
-- fastpath 107873592355 # 43.8%
frontend 97172556531 # 39.4% (39.4%)
-- latency 78129118386 # 31.7%
-- bandwidth 19043438145 # 7.7%
backend 39194677724 # 15.9% (15.9%) low
-- cpu 9679470926 # 3.9%
-- memory 29515206798 # 12.0%
speculation 2034510364 # 0.8% ( 0.8%) low
-- branch mispredict 2009213869 # 0.8%
-- pipeline restart 25296495 # 0.0%
smt-contention 36983215 # 0.0% ( 0.0%)
cpu-cycles 121365883043 # 0.18 GHz
instructions 281240028279 # 2.32 IPC
instructions 94364096212 # 16.955 l2 access per 1000 inst
l2 hit from l1 1517135255 # 5.84% l2 miss
l2 miss from l1 34263921 #
l2 hit from l2 pf 23599948 #
l3 hit from l2 pf 5971599 #
l3 miss from l2 pf 53254420 #
instructions 94178737335 # 2.360 float per 1000 inst
float 512 41 # 0.000 AVX-512 per 1000 inst
float 256 10 # 0.000 AVX-256 per 1000 inst
float 128 222295322 # 2.360 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2697638 #
opcache 1003361 # 371.941 opcache per 1000 inst
opcache miss 536407 # 53.5% opcache miss rate
l1 dTLB miss 6475 # 2.400 L1 dTLB per 1000 inst
l2 dTLB miss 1209 # 0.448 L2 dTLB per 1000 inst
instructions 2698623 #
icache 1306796 # 484.245 icache per 1000 inst
icache miss 107669 # 8.2% icache miss rate
l1 iTLB miss 8 # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19 # 0.007 TLB flush per 1000 inst
Intel metrics
elapsed 39.446
on_cpu 0.040 # 0.65 / 16 cores
utime 18.776
stime 6.749
nvcsw 4047 # 93.68%
nivcsw 273 # 6.32%
inblock 59048 # 1496.92/sec
onblock 16778872 # 425360.42/sec
cpu-clock 25511570118 # 25.512 seconds
task-clock 25515172774 # 25.515 seconds
page faults 152548 # 5978.717/sec
context switches 4331 # 169.742/sec
cpu migrations 344 # 13.482/sec
major page faults 394 # 15.442/sec
minor page faults 152147 # 5963.001/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 33410561257 # 121.798 branches per 1000 inst
branch misses 38726732 # 0.12% branch miss
conditional 33410575849 # 121.798 conditional branches per 1000 inst
indirect 194086180 # 0.708 indirect branches per 1000 inst
slots 568310686370 #
retiring 253168976143 # 44.5% (44.5%)
-- ucode 9482689758 # 1.7%
-- fastpath 243686286385 # 42.9%
frontend 68582457368 # 12.1% (12.1%)
-- latency 27346206309 # 4.8%
-- bandwidth 41236251059 # 7.3%
backend 208066438235 # 36.6% (36.6%)
-- cpu 162560138594 # 28.6%
-- memory 45506299641 # 8.0%
speculation 37059301540 # 6.5% ( 6.5%)
-- branch mispredict 5728184679 # 1.0%
-- pipeline restart 31331116861 # 5.5%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 94976267658 # 0.15 GHz
instructions 274031527927 # 2.89 IPC
l2 access 3796563893 # 13.862 l2 access per 1000 inst
l2 miss 588428056 # 15.50% l2 miss
cpu-cycles 94762245616 # 11.8% memory latency
load stalls 9914219291 # 6.0% l1 bound
l1 miss 4275852025 # 2.0% l2 bound
l2 miss 2381220001 # 0.4% l3 bound
l3 miss 1955017893 # 2.1% dram bound
store_stalls 1244352811 # 1.3% store bound
Process overview shows some quick invocations of Java. Test overhead also contributes to the metrics above.
437 processes
8 java 26.96 16.68
68 clinfo 17.52 5.00
4 Finalizer 13.48 8.34
3 Common-Cleaner 13.47 8.31
38 vulkaninfo 1.14 1.31
1 dd 0.16 3.00
4 vulkani:disk$0 0.12 0.13
6 glxinfo:gdrv0 0.10 0.07
6 glxinfo:gl0 0.10 0.07
2 llvmpipe-0 0.06 0.07
2 llvmpipe-1 0.06 0.07
2 llvmpipe-10 0.06 0.07
2 llvmpipe-11 0.06 0.07
2 llvmpipe-12 0.06 0.07
2 llvmpipe-13 0.06 0.07
2 llvmpipe-14 0.06 0.07
2 llvmpipe-15 0.06 0.07
2 llvmpipe-2 0.06 0.07
2 llvmpipe-3 0.06 0.07
2 llvmpipe-4 0.06 0.07
2 llvmpipe-5 0.06 0.07
2 llvmpipe-6 0.06 0.07
2 llvmpipe-7 0.06 0.07
2 llvmpipe-8 0.06 0.07
2 llvmpipe-9 0.06 0.07
6 php 0.05 0.08
2 glxinfo 0.05 0.03
2 glxinfo:cs0 0.05 0.03
2 glxinfo:disk$0 0.05 0.03
2 glxinfo:sh0 0.05 0.03
2 glxinfo:shlo0 0.05 0.03
6 clang 0.04 0.08
1 lspci 0.01 0.02
8 C1 CompilerThre 0.00 16.29
4 C2 CompilerThre 0.00 13.48
4 G1 Conc#0 0.00 13.48
4 G1 Refine#0 0.00 13.48
4 GC Thread#0 0.00 13.48
4 Reference Handl 0.00 13.48
4 Service Thread 0.00 13.48
4 Signal Dispatch 0.00 13.48
4 Sweeper thread 0.00 13.48
4 VM Thread 0.00 13.48
5 rm 0.00 1.19
3 rocminfo 0.00 0.03
86 sh 0.00 0.00
12 gcc 0.00 0.00
8 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 G1 Main Marker 0.00 0.00
4 G1 Young RemSet 0.00 0.00
4 VM Periodic Tas 0.00 0.00
4 bash 0.00 0.00
3 bork 0.00 0.00
3 bork.sh 0.00 0.00
3 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structure
720779) bork cpu=6 start=9.07 finish=16.24
720780) bork.sh cpu=15 start=9.07 finish=16.24
720781) java cpu=0 start=9.07 finish=16.24
720782) java cpu=5 start=9.08 finish=16.24
720783) GC Thread#0 cpu=-1 start=9.08 finish=16.24
720784) G1 Main Marker cpu=0 start=9.08 finish=16.24
720785) G1 Conc#0 cpu=-1 start=9.08 finish=16.24
720786) G1 Refine#0 cpu=-1 start=9.09 finish=16.23
720787) G1 Young RemSet cpu=0 start=9.09 finish=16.23
720788) VM Thread cpu=-1 start=9.09 finish=16.24
720789) Reference Handl cpu=-1 start=9.10 finish=16.24
720790) Finalizer cpu=3 start=9.10 finish=16.24
720791) Signal Dispatch cpu=-1 start=9.10 finish=16.24
720792) Service Thread cpu=-1 start=9.10 finish=16.24
720793) C2 CompilerThre cpu=-1 start=9.10 finish=16.24
720796) C1 CompilerThre cpu=-1 start=9.10 finish=10.64
720794) C1 CompilerThre cpu=-1 start=9.10 finish=16.24
720795) Sweeper thread cpu=-1 start=9.10 finish=16.24
720797) VM Periodic Tas cpu=0 start=9.11 finish=16.23
720798) Common-Cleaner cpu=12 start=9.11 finish=16.24
