A benchmark of the Apache Spark using the PySpark interface. Apache Spark is an open-source unified analytics engine. There are four tests each with different sub-scenarios.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics show overall 1.6 cores active, little floating point and high retirement.

elapsed              2951.391
on_cpu               0.101          # 1.61 / 16 cores
utime                4425.534
stime                328.609
nvcsw                4856288        # 86.78%
nivcsw               739739         # 13.22%
inblock              8              # 0.00/sec
onblock              13977128       # 4735.78/sec
cpu-clock            42663814969283 # 42663.815 seconds
task-clock           42665967867809 # 42665.968 seconds
page faults          44053031       # 1032.510/sec
context switches     6717843        # 157.452/sec
cpu migrations       931799         # 21.839/sec
major page faults    1186           # 0.028/sec
minor page faults    44006031       # 1031.408/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             64285744518706 # 177.115 branches per 1000 inst
branch misses        116204596704   # 0.18% branch miss
conditional          50330874718377 # 138.668 conditional branches per 1000 inst
indirect             5391625116869  # 14.855 indirect branches per 1000 inst
cpu-cycles           219995811005385 # 3.55 GHz
instructions         474805279426321 # 2.16 IPC
slots                441582327330810 #
retiring             157423348696856 # 35.6% (58.5%) high
-- ucode             567875726893   #     0.1%
-- fastpath          156855472969963 #    35.5%
frontend             19549579490013 #  4.4% ( 7.3%)
-- latency           13520739667860 #     3.1%
-- bandwidth         6028839822153  #     1.4%
backend              88382195394937 # 20.0% (32.8%)
-- cpu               11254725399387 #     2.5%
-- memory            77127469995550 #    17.5%
speculation          3802926667430  #  0.9% ( 1.4%)
-- branch mispredict 2933296078179  #     0.7%
-- pipeline restart  869630589251   #     0.2%
smt-contention       172423207228753 # 39.0% ( 0.0%)
cpu-cycles           218890360802921 # 3.54 GHz
instructions         474914641892602 # 2.17 IPC
instructions         158472945209461 # 13.127 l2 access per 1000 inst
l2 hit from l1       2033359144055  # 3.50% l2 miss
l2 miss from l1      46451580342    #
l2 hit from l2 pf    20499319858    #
l3 hit from l2 pf    12301929165    #
l3 miss from l2 pf   14068223396    #
instructions         158458706917338 # 19.971 float per 1000 inst
float 512            3950           # 0.000 AVX-512 per 1000 inst
float 256            503524         # 0.000 AVX-256 per 1000 inst
float 128            3164589063781  # 19.971 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         37917          # 0.000 scalar per 1000 inst
instructions         2709961        #
opcache              1001966        # 369.734 opcache per 1000 inst
opcache miss         534211         # 53.3% opcache miss rate
l1 dTLB miss         7266           # 2.681 L1 dTLB per 1000 inst
l2 dTLB miss         1171           # 0.432 L2 dTLB per 1000 inst
instructions         2703521        #
icache               1299232        # 480.570 icache per 1000 inst
icache miss          111019         #  8.5% icache miss rate
l1 iTLB miss         13             # 0.005 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics

elapsed              16920.126
on_cpu               0.261          # 4.17 / 16 cores
utime                68536.052
stime                2033.421
nvcsw                20512849       # 60.38%
nivcsw               13458961       # 39.62%
inblock              348193440      # 20578.66/sec
onblock              533099000      # 31506.80/sec
cpu-clock            242347906374008 # 242347.906 seconds
task-clock           242358478400143 # 242358.478 seconds
page faults          351178559      # 1449.005/sec
context switches     47074935       # 194.237/sec
cpu migrations       5096975        # 21.031/sec
major page faults    29159          # 0.120/sec
minor page faults    350541678      # 1446.377/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             278766653515579 # 177.353 branches per 1000 inst
branch misses        798092266132   # 0.29% branch miss
conditional          278766666338011 # 177.353 conditional branches per 1000 ins
t
indirect             62205170112713 # 39.575 indirect branches per 1000 inst
slots                1007749798569482 #
retiring             554744105785745 # 55.0% (55.0%) high
-- ucode             31026419537826 #     3.1%
-- fastpath          523717686247919 #    52.0%
frontend             291111717445577 # 28.9% (28.9%)
-- latency           100773439754211 #    10.0%
-- bandwidth         190338277691366 #    18.9%
backend              123976744622939 # 12.3% (12.3%) low
-- cpu               39235397453279 #     3.9%
-- memory            84741347169660 #     8.4%
speculation          38960763394989 #  3.9% ( 3.9%)
-- branch mispredict 35041534228150 #     3.5%
-- pipeline restart  3919229166839  #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           481594355727426 # 1.78 GHz
instructions         915628982456618 # 1.90 IPC
l2 access            6943696813220  # 10.398 l2 access per 1000 inst
l2 miss              1685572410210  # 24.27% l2 miss
cpu-cycles           364846275080470 # 25.2% memory latency
load stalls          90138769389868 # 13.4% l1 bound
l1 miss              41084570173389 #  3.7% l2 bound
l2 miss              27487531859349 #  1.6% l3 bound
l3 miss              21589200763974 #  5.9% dram bound
store_stalls         1939852578337  #  0.5% store bound

Process overview includes many processes and a Java set of threads with names by thread, only first part shown here.

114616 processes
	640 dispatcher-even      76964.32  3834.19
	741 block-manager-s      66938.04  6289.06
	494 block-manager-a      44631.10  4193.59
	1994 python3              42767.48  1275.63
	320 shuffle-client-      38489.52  1920.40
	320 shuffle-server-      38489.52  1920.40
	320 map-output-disp      38464.08  1912.89
	160 task-result-get      19229.00   955.73
	162 QueryStageCreat      15328.79  1250.05
	120 spark-listener-      14421.84   716.89
	162 java                  9638.24   483.08
	 80 dispatcher-Bloc       9619.42   479.07
	 81 Finalizer             4819.24   241.59
	 80 Common-Cleaner        4819.23   241.56
	 40 org.apache.hado       4811.23   240.09
	 40 rpc-boss-3-1          4811.23   240.09
	 40 shuffle-boss-6-       4811.23   240.09
	 40 rpc-server-4-1        4811.19   240.05
	 40 rpc-server-4-2        4811.18   240.05
	 40 rpc-server-4-3        4811.18   240.05
	 40 rpc-server-4-4        4811.18   240.05
	 40 rpc-server-4-5        4811.18   240.05
	 40 rpc-server-4-6        4811.18   240.05
	 40 rpc-server-4-7        4811.18   240.05
	 40 rpc-server-4-8        4811.18   240.05
	 40 rpc-client-1-1        4811.17   240.04
	 40 rpc-client-1-2        4811.17   240.04
	 40 rpc-client-1-3        4811.16   240.04
	 40 rpc-client-1-4        4811.16   240.04
	 40 rpc-client-1-5        4811.16   240.04
	 40 rpc-client-1-6        4811.16   240.04
	 40 rpc-client-1-7        4811.16   240.04
	 40 rpc-client-1-8        4811.16   240.04
	 40 Thread-0              4810.92   239.96
	 40 shutdown-hook-0       4810.73   239.88
	 40 element-trackin       4810.44   239.76
	 40 netty-rpc-env-t       4810.13   239.58
	 40 RemoteBlock-tem       4808.39   239.21
	 40 heartbeat-recei       4807.42   238.98
	 40 driver-heartbea       4807.34   238.97
	 40 task-starvation       4807.25   238.92
	 40 task-abort-time       4807.20   238.93
	 40 dag-scheduler-e       4807.03   238.90
	 40 context-cleaner       4807.01   238.90
	 40 SparkUI-57            4806.92   238.86
	 40 SparkUI-58            4806.91   238.86
	 40 SparkUI-59            4806.90   238.86
	 40 SparkUI-60            4806.89   238.85
	 40 SparkUI-61            4806.89   238.85
	 40 SparkUI-62            4806.88   238.85
	 40 SparkUI-63            4806.86   238.85
	 40 SparkUI-64            4806.85   238.85
	 40 SparkUI-65            4806.85   238.85
	 40 SparkUI-66            4806.85   238.85
	 40 executor-kill-m       4806.31   238.73
	 40 executor-heartb       4806.04   238.64
	 40 Logging-Cleaner       4805.67   238.61
	 40 Thread-2              4805.59   238.60
	 40 Thread-4              4804.58   238.53
	 36 Thread-20             4638.91   215.45
	 18 serve-DataFrame       3116.02    68.72
	 36 broadcast-excha       3052.28   295.16
	 72 checkPathsExist        391.91    32.46
	  4 Thread-19              164.64    22.95
	144 process reaper         138.68 12318.76
	  3 Thread-1526            104.07    11.51
	  3 Thread-1525            104.03    11.49
	  3 Thread-1523            103.75    11.43
	  3 Thread-1524            103.59    11.42
	  3 Thread-1519            103.58    11.41
	  3 Thread-1522            103.53    11.41
	  3 Thread-1521            103.51    11.41