website – Performance analysis, tools and experiments

Histograms

Posted on February 11, 2024 by mevFebruary 11, 2024

I now have the ability to create summary histograms characterizing the workloads. These are (re)-generated as I update performance reports, but following is values with ~170 workloads added. Walking through the histograms and what they describe…

Most of the runs are fairly quick, though I have a few benchmarks that run up to several hours. This is the elapsed time that often runs the workload three times. I then run this benchmark ~6 times collecting various metrics.

The distribution of worklods shows a small number of single-threaded workloads, a cluster around the number of cores w/o hyperthreading and then some that use as many cores as possible.

The number of page faults has a few outliers that are interesting for their own analysis: octave-benchmark, gimp, lulesh, openjpeg, tungsten… are these bringing file information into memory and operating on it? There is a similar story with context switches and stress-ng, wireguard, compress-rar which I assume are all more interrupt driven than CPU.

IPC shows a range that is lower than I expected but presumably some of these can’t take advantage as much of the core-bound aspects.

Similar picture for GHz which I calculate using the number of cycles divided by seconds. For some of those on the low end, it is similar to stream – waiting on memory traffic or similar reason? I assume for some others we have power limitations. Given how dynamic power is, assume the combination of IPC and GHz are more important – perhaps try an X/Y scatter plot with both variables?

Retirement rate as a percent of available slots shows more of a bell curve

Frontend stalls have a diminishing relation where those at the high end might be a subset to dive deeper

Backend stalls are more of a bell curve with a minimal amount for any of them and a small subset with a very high percentage

Speculative stalls are low for most workloads with a small number of outliers

Float density has up to have the code with little floating point and the rest on a distribution

Both the opcache and the i-cache miss rates surprise me mostly on how narrow the range of miss-rates are at. Seems like this doesn’t contribute by itself to frontend stalls as much as other factors, e.g. TLB? Separately is the miss rate the right metric or is there a more distilled metric?

Related picture with the icache miss rates.

The L2 cache density (per 1000 instructions); shows where various benchmarks use L2

Branch miss rates have a similar distribution as frontend stalls with most having a low miss rate and then a tail of a few benchmarks with higher miss rates.

How branchy is the code as determined by number of retired branches per 1000 instructions.

SMT contention is the number of slots going to the “other” core in a hyperthread. The large bar on left reflects both single-threaded workloads and those MPI workloads on physical cores.

There is a similar set of Intel 13500H benchmark plots. I won’t include them here because they reflect similar profiles (fortunately).

Overall, the histograms provide both a nice summary of a population of workloads (phoronix) where it also be interesting to compare/contrast with different workloads such as SPEC. It could also be interesting to aggregate the subset of benchmarks for a specific article. It could also be interesting to dive deeper on the outliers to understand how this affects things and how to best optimize. So many different avenues opened from this…

Adding summary statistics for all benchmarks

Posted on February 10, 2024 by mevFebruary 11, 2024

After adding general parsing of measurement statistics, I can now also create a statistical summary across all ~170 benchmarks as shown below. This lets me see for example the minimum IPC, maximum IPC, mean IPC and standard deviation. This will then provide some information whether a particular workload is “low” or “high” in a metric and how significantly.

The statistics below come from the workload statistics with AMD metrics first followed by Intel metrics. For example, based on this table we can see mean values for topdown metrics:

Retirement go from 0.8% to 76.2% with a mean of 32.3% and standard deviation of 15.6%. A retirement rate over 64.5% would be two standard deviations above the mean.
Frontend stalls go from 0.1% to 73% with a mean of 22.5% and a standard deviation of 17.5%
Backend stalls go from 4.1% to 97.1% with a mean of 41.6% and a standard deviation of 21.3%
Speculative stalls go from 0% to 21.2% with a mean of 3.56% and a standard deviation of 3.95%

These numbers are recalculated as the reports are re-generated but with 170 workloads mostly included are a good first overview of how the workloads operate on my AMD 7840.

Some next steps including flagging the outliers in the metrics and seeing how I can create histograms for different fields below.

metric	count	min	max	median	mean	stddev
elapsed	174	2.5	8.8e+03	554	1.25e+03	1.62e+03
on_cpu	174	0	16	7.26	7.31	5.53
inblock	172	0	5.47e+06	0	3.19e+04	4.16e+05
onblock	172	0.46	4e+05	133	9.97e+03	3.54e+04
page-fault	174	4.33	1.24e+05	2.36e+03	1.14e+04	2.04e+04
context-switch	174	1.51	5.12e+04	75	2.18e+03	7.73e+03
IPC	174	0.03	4.63	1.44	1.64	0.88
GHz	174	0	4.62	1.98	1.91	1.39
retire-rate	174	0.8	76.2	29.2	32.3	15.6
frontend-stall	174	0.1	73	19.1	22.5	17.5
backend-stall	174	4.1	97.1	36.8	41.6	21.3
spec-stall	174	0	21.2	2.7	3.56	3.95
retire-ucode	174	0	1	0	0.0736	0.131
retire-fastpath	174	0.7	76.2	24.7	27.8	14.5
float-density	174	0.013	676	67.5	133	154
frontend-latency	174	0.1	58.3	9.1	13.6	12.6
frontend-bandwidth	174	0	28.7	5.5	6.06	4.86
opcache-miss	87	52.4	54.7	53.8	53.8	0.401
icache-miss	87	8.2	9.6	8.5	8.57	0.294
backend-cpu	174	0.7	64	9.3	12.3	10.9
backend-memory	174	0.4	95.5	19.9	23.5	17.4
amd-l2-miss	174	0.08	59.8	16.3	17.3	11.7
amd-l2-density	174	0.036	470	38.2	49.7	58.9
spec-branch	174	0	21.1	2.1	3.07	3.8
spec-pipeline	174	0	1.4	0	0.11	0.205
branch-miss	174	0.01	14.8	1.96	2.79	2.99
branch-density	174	8.68	276	125	130	61.9
branch-cond	174	5.66	271	92.7	98.4	48.9
branch-ind	174	0.003	29.8	3.07	4.53	5.4
smt-contention	174	0	45.4	12.5	13.3	13.1
elapsed	169	1.49	1.44e+04	750	1.71e+03	2.45e+03
on_cpu	169	0	15.8	9.05	7.76	5.6
inblock	166	0	8.53e+04	53.9	2.33e+03	9.24e+03
onblock	166	0.37	4.25e+05	38.1	9.06e+03	3.59e+04
page-fault	169	4.05	1.17e+05	1.73e+03	1.02e+04	1.91e+04
context-switch	169	1.81	9.27e+04	72.6	2.44e+03	1e+04
IPC	169	0.05	5.54	1.82	2	0.975
GHz	169	0	3.06	1.46	1.3	0.885
retire-rate	169	3.5	87.6	42.8	44.3	14.7
frontend-stall	169	1	50.4	18.3	19.4	11.1
backend-stall	169	1.3	93.5	23.8	27.6	18.1
spec-stall	169	0	46.7	6.8	9.16	8.55
retire-ucode	169	0	16.7	3	3.37	2.4
retire-fastpath	169	2.3	83.4	39.5	40.9	14.1
frontend-latency	169	0.3	35.1	9.5	10.3	6.63
frontend-bandwidth	169	0.4	25.8	8.4	9.09	6
backend-cpu	169	0.6	67.7	10.4	13.8	11
backend-memory	169	0	90.4	10.1	13.8	14.1
l1-stall	77	0	24.4	5.2	5.45	4.88
l2-stall	77	0	57.1	8.6	9.01	9.09
l3-stall	77	0	35	2.6	4.07	5.51
dram-stall	77	0	49	6	8.86	10.6
store-stall	77	0	28.3	0.9	1.56	3.53
intel-l2-miss	169	0.59	92.9	26.8	28.2	17.1
intel-l2-density	169	0.029	370	26.9	35.7	45.2
spec-branch	169	0	46.7	6.2	8.71	8.58
spec-pipeline	169	0	6.2	0.3	0.456	0.701
branch-miss	169	0	20.4	1.03	1.81	2.58
branch-density	169	6.24	275	126	127	60.8
branch-cond	169	6.24	275	126	127	60.8
branch-ind	169	0.035	83	21.1	22.6	17

Creating a more automated performance table

Posted on February 5, 2024 by mevFebruary 5, 2024

I have been maintaining a table by hand with various performance metrics – both on the website and separately in Google Sheets. In addition to the extra work required, I also by nature only put some of the columns.

So I’ve been experimenting with more automatically generating this table from the performance reports for each benchmark. The initial version of the table is below. I am able to have many more columns to examine and sort.

A few areas of ongoing work:

Not all the benchmarks are listed, I’ve started with those coming from Phoronix articles; also generated from an inventory list for each article of benchmarks used
A few metrics like opcache miss rate and icache miss rate are not there for all benchmarks, this metric was added later and I am rerunning benchmarks that came before
Intel metrics are not yet in the table; this are a normal extension, though also have new metrics
I don’t have the “scores”, this may be partially manual – where I report at least a score ratio and slurp it into the table generator

However, overall seems like a step in the right direction. Once this matures a bit more, I will replace the other table that shows up when I click on “Workload”

benchmark	status	elapsed	on_cpu	inblock	onblock	page-fault	context-switch	IPC	GHz	retire-rate	frontend-stall	backend-stall	spec-stall	retire-ucode	retire-fastpath	float-density	frontend-latency	frontend-bandwidth	opcache-miss	icache-miss	backend-cpu	backend-memory	l2-miss	l2-density	spec-branch	spec-pipeline	branch-miss	branch-density	branch-cond	branch-ind	smt-contention
3dmark	skipped GPU
apache-iotdb	CPU	1243.740	1.84	312.19	23.64	1609.838	1056.320	1.63	2.39	32.5	14.4	52.1	1.0	0.1	25.4	10.518	7.6	3.6	53.6	8.4	4.5	36.3	24.19	22.277	0.8	0.0	0.54	169.699	140.839	2.934	21.8
appleseed	CPU	1561.576	14.88	0.01	9.51	32.629	297.712	1.20	3.65	28.8	25.1	41.3	4.8	0.0	21.3	340.142	13.4	5.2	53.8	8.4	10.4	20.2	3.71	67.444	3.3	0.2	2.32	87.423	54.348	5.586	25.9
askap	CPU	485.110	7.82	1.10	1774.15	3826.390	19.233	0.40	2.21	6.8	1.8	90.9	0.5	0.0	6.5	645.361	1.2	0.5	53.6	8.4	6.3	80.4	59.80	38.219	0.3	0.2	0.76	87.695	79.567	1.674	4.7
asmfish	CPU	1517.211	15.46	0.00	9.10	125.648	11.840	1.24	3.76	29.0	30.0	33.6	7.3	0.0	20.6	11.949	15.0	6.3	54.0	8.3	4.0	19.9	13.35	22.456	5.2	0.0	7.14	97.540	83.841	1.365	29.0
astcenc	CPU	516.788	12.72	0.00	1420.13	287.709	15.550	1.39	3.11	54.0	19.9	25.1	1.1	0.3	36.5	450.784	13.1	0.4	53.8	8.5	14.8	2.3	6.54	14.974	0.7	0.0	1.78	55.516	49.958	0.119	31.8
avifenc	CPU	788.316	10.36	0.04	3032.23	1734.826	13.377	1.78	2.52	40.3	34.9	20.4	4.3	0.0	29.9	147.991	18.4	7.5	53.7	8.3	3.5	11.7	1.98	42.366	3.2	0.0	3.66	88.461	70.652	2.326	25.7
blender	CPU	4301.794	15.81	5.42	3.46	230.854	9.948	1.15	4.03	27.1	26.6	42.3	4.0	0.1	20.1	350.889	13.2	6.6			12.3	19.3	8.18	63.306	2.9	0.1	2.20	110.727	72.921	9.999	25.4
build-apache	CPU	98.709	3.49	0.00	16508.60	52993.070	361.759	1.43	0.75	24.8	49.3	19.5	6.4	0.0	22.4	22.046	34.0	10.8			3.0	14.6	19.30	39.737	5.7	0.0	4.17	207.520	155.241	4.331	9.3
build-ffmpeg	CPU	179.811	12.29	0.04	82729.41	19491.189	144.075	1.23	2.95	24.5	41.2	29.6	4.6	0.0	19.8	24.606	24.3	8.9	54.5	9.3	3.1	20.7	17.00	39.317	3.7	0.0	2.70	210.503	161.784	5.040	19.4
build-godot	CPU	1209.646	14.50	0.00	11339.32	16435.080	106.078	0.91	3.66	18.2	40.5	37.3	3.9	0.0	14.7	21.554	24.7	8.0			2.9	27.2	20.77	53.559	3.1	0.0	2.99	209.661	163.788	3.958	19.3
build-linux-kernel	CPU	4384.994	15.17	344.08	16372.82	35617.133	446.765	1.12	3.60	21.8	46.4	27.9	4.0	0.0	18.2	25.616	29.3	9.5			3.4	20.0	16.59	48.337	3.3	0.0	3.15	209.069	157.315	4.836	16.3
build-llvm	CPU	5163.931	15.11	0.15	14077.21	22076.742	32.050	0.81	3.88	16.4	39.2	41.1	3.3	0.0	13.1	18.776	23.5	7.8			2.7	30.1	21.96	57.868	2.6	0.0	3.01	209.027	160.259	4.802	20.0
build-nodejs	CPU	2144.566	15.05	0.00	5716.14	19904.636	48.599	0.76	3.89	16.3	34.8	46.1	2.8	0.0	12.2	15.153	19.5	6.4			2.4	32.0	22.58	69.306	2.1	0.0	2.50	211.978	165.984	4.214	25.3
build-php	CPU	221.438	7.92	0.00	28127.56	33229.654	302.187	1.09	1.87	20.7	39.8	34.9	4.6	0.0	17.6	24.210	25.3	8.5			3.3	26.4	21.62	43.692	3.9	0.0	3.27	207.684	158.008	4.452	15.0
cassandra	CPU	4332.928	4.13	2.11	68.01	747.511	30474.147	0.71	2.46	14.0	64.0	19.9	2.0	0.0	12.4	8.199	49.1	7.6			1.5	16.2	40.19	104.378	1.7	0.0	5.97	169.462	121.922	3.432	11.5
compress-7zip	CPU	87.917	11.97	48.77	16.38	4995.199	81.292	1.08	3.18	21.1	38.2	30.0	10.7	0.0	17.3	0.036	21.1	10.3			6.7	17.9	35.93	6.661	8.7	0.0	12.24	129.344	108.687	1.564	18.0
compress-lz4	CPU	431.352	0.88	9432.26	3.97	36610.790	17.171	1.45	0.27	22.4	14.0	42.9	20.8	0.0	22.4	86.140	9.4	4.6			7.9	35.0	19.27	91.779	20.7	0.1	4.87	124.827	119.646	0.015	0.0
compress-zstd	CPU	2901.236	4.17	143.53	2.86	873.812	74.802	1.37	1.03	24.1	10.1	58.5	7.3	0.0	22.1	22.889	5.7	3.5			3.8	49.7	39.52	34.177	6.6	0.1	2.77	123.737	111.965	0.040	8.5
c-ray	CPU	193.229	14.72	0.62	760.42	36.804	251.676	1.98	4.02	55.8	3.4	40.5	0.3	0.0	33.0	324.015	0.2	1.8			23.6	0.3	1.48	0.262	0.2	0.0	0.30	44.256	43.606	0.126	40.8
deepsparse	CPU	3569.478	5.22	344.15	11826.42	3724.873	606.163	1.15	1.38	21.3	9.4	68.3	1.1	0.1	21.2	50.472	7.2	2.2			39.8	28.4	11.60	156.378	1.0	0.1	2.64	42.361	32.909	1.305	0.1
dirt-rally2	skipped GPU
duckdb	CPU	547.152	3.67	0.19	121184.06	21859.520	379.467	1.73	0.99	36.1	32.2	27.4	4.3	0.1	29.4	20.931	14.4	11.9			3.2	19.2	28.10	11.842	3.5	0.0	1.49	209.081	129.752	23.531	18.4
easywave	CPU	1922.175	14.55	0.00	35967.00	24.884	11.771	0.26	4.13	5.2	2.9	91.9	0.1	0.0	4.7	357.411	1.0	1.6			5.1	77.9	16.29	83.023	0.1	0.0	0.19	197.858	193.648	1.374	9.6
embree	CPU	589.246	13.93	3985.68	2.80	314.199	27.795	0.49	3.69	13.5	13.5	65.9	7.1	0.0	10.8	296.440	8.8	2.1			13.5	39.8	25.36	98.697	5.7	0.0	13.55	80.306	67.226	0.732	19.2
encode-flac	CPU	93.904	0.76	0.00	60068.41	2271.041	62.589	2.79	0.22	47.7	10.1	37.6	4.6	0.0	47.7	166.545	7.2	2.9			23.5	14.0	2.31	15.402	4.6	0.0	2.44	71.850	65.628	0.540	0.0
encode-mp3	CPU	33.779	0.57	0.00	375.15	7912.100	144.996	2.99	0.16	49.2	16.2	23.7	10.9	0.0	49.2	267.806	10.0	6.2			15.0	8.6	6.95	4.075	10.9	0.0	3.03	94.639	82.423	0.755	0.0
encode-opus	CPU	139.297	0.84	0.00	12439.35	1376.276	33.588	3.02	0.24	48.6	9.3	33.9	8.2	0.0	48.6	358.276	6.0	3.3			25.9	7.9	2.16	17.839	8.0	0.1	1.90	94.603	82.697	0.735	0.0
etlegacy	skipped GPU
f122	skipped GPU
ffmpeg	CPU	3516.551	3.62	0.15	637.18	6256.400	1261.842	1.90	0.90	38.6	21.2	33.5	6.7	0.4	32.6	162.036	11.7	6.5			9.1	19.5	10.60	38.418	5.4	0.2	3.33	75.632	47.606	5.492	14.6
fluidx3d	skipped OpenCL
gimp	CPU	262.274	1.94	1002.19	11179.48	100662.721	8897.143	1.15	0.53	20.6	24.0	52.7	2.8	0.1	19.2	168.563	16.4	6.1			12.4	37.0	21.63	23.471	2.5	0.1	3.86	143.514	106.080	2.986	6.3
glmark2	skipped GPU
gpuowl	skipped OpenCL
graphics-magick	CPU	1644.398	10.85	0.00	10.00	14137.836	25.235	1.70	2.76	49.6	5.5	43.4	1.6	0.0	30.6	297.334	2.5	0.9			22.4	4.4	24.08	5.491	1.0	0.0	1.17	102.333	78.875	4.549	38.2
gravitymark	skipped GPU
gromacs	CPU	482.655	7.61	553.54	1557.02	1574.783	23.464	1.90	2.07	32.2	5.6	60.8	1.4	0.0	32.1	67.501	3.3	2.3			25.5	35.3	18.65	24.975	1.3	0.1	1.39	31.083	23.552	1.198	0.1
hitman3	skipped GPU
indigobench	CPU	404.353	14.31	200.40	31.64	633.297	129.451	1.00	3.54	24.7	13.4	57.0	4.8	0.1	17.7	555.900	6.2	3.5			14.2	26.9	8.57	58.648	3.3	0.2	3.95	84.099	60.465	4.013	27.9
java-scimark2	CPU	101.740	0.77	0.00	127.31	3103.330	88.654	2.65	0.22	41.1	3.0	54.6	1.3	0.0	41.1	99.215	1.8	1.2			29.2	25.4	38.61	59.416	1.2	0.1	0.26	150.673	144.912	0.306	0.0
john-the-ripper	CPU	719.563	14.02	23.64	8.43	70.128	8.453	1.48	3.67	42.9	1.2	55.8	0.1	0.0	27.2	7.953	0.2	0.6			27.2	8.1	7.01	5.588	0.0	0.0	0.17	18.861	13.969	0.507	36.7
kvazaar	CPU	912.119	11.09	0.29	25.39	1308.250	1961.519	2.13	2.68	53.4	19.4	24.0	3.2	0.1	35.4	118.372	8.4	4.5			6.4	9.5	6.24	27.847	2.1	0.0	1.96	86.141	70.036	1.250	33.6
lammps	CPU	3607.385	7.95	0.22	41.42	11.158	4.446	2.01	2.09	33.8	9.7	35.3	21.2	0.0	33.8	332.278	5.0	4.7			21.2	14.1	3.39	63.872	21.1	0.1	3.58	98.234	87.004	0.951	0.1
lczero	CPU	2286.628	12.61	0.00	12.32	484.279	2457.380	1.06	3.00	23.6	8.3	66.0	2.1	0.1	17.9	89.041	3.4	2.9			28.2	22.0	7.87	95.467	1.5	0.0	2.66	48.790	32.343	5.110	23.9
liquid-dsp	CPU	1697.119	5.22	0.00	10.23	21.427	8.409	1.76	1.12	34.3	0.8	64.0	0.9	0.0	30.7	176.946	0.5	0.2			28.4	28.9	6.47	2.312	0.5	0.4	0.05	146.505	66.982	29.773	10.5
llama.cpp	CPU	128.978	6.45	0.00	116.11	490.228	15.114	1.44	1.75	22.6	4.8	72.3	0.3	0.0	22.6	127.940	4.2	0.7			11.7	60.5	22.50	35.157	0.3	0.0	1.71	61.105	56.846	0.470	0.1
llamafile	CPU	5740.930	6.84	5470040.55	6.64	9006.224	1342.677	1.44	1.89	21.6	23.9	54.1	0.4	0.1	20.9	58.437	17.7	5.6			9.3	43.4	35.43	18.740	0.4	0.0	2.40	203.530	182.102	0.279	2.7
luxcorerender	skipped GPU
minibude	CPU	860.823	15.34	10.94	1.71	11.295	15.710	0.88	4.11	18.6	0.1	81.2	0.1	0.0	14.7	90.745	0.1	0.0			64.0	0.4	0.41	0.564	0.0	0.0	0.19	36.357	34.835	0.007	20.8
mrbayes	CPU	269.927	7.41	0.00	1127.89	102.080	46.528	3.30	1.87	56.3	19.1	21.4	3.2	0.0	56.2	201.198	11.9	7.2			18.3	3.1	1.71	18.809	3.1	0.0	0.81	118.634	69.651	15.646	0.1
nginx	CPU	1809.766	5.66	0.00	12.78	18.359	4790.620	0.91	3.32	23.0	51.9	24.8	0.3	1.0	18.2	675.701	32.8	10.4			6.2	14.5	15.68	98.502	0.3	0.0	7.53	98.384	55.679	2.503	16.7
npb	CPU	2283.426	5.26	10.91	318.19	2702.559	38.509	1.32	1.49	22.9	5.8	65.9	5.4	0.0	22.9	290.912	2.9	3.0			33.2	32.7	21.89	28.407	5.2	0.1	1.97	82.620	56.063	9.222	0.0
numpy	CPU	361.124	0.96	0.00	39.52	39784.678	16.829	2.40	0.28	40.3	40.3	17.6	1.9	0.1	40.2	62.065	21.2	19.1			3.5	14.0	8.38	81.788	1.6	0.3	0.43	192.310	139.924	12.795	0.0
onednn	CPU	71.340	8.13	109.67	25.34	1580.985	23.442	0.20	2.44	3.6	3.8	92.6	0.0	0.0	3.4	4.541	3.1	0.5			15.2	73.1	44.79	50.266	0.0	0.0	1.66	27.616	23.555	0.274	4.6
opencl-benchmark	skipped OpenCL
openfoam	CPU	411.853	7.77	665.29	1873.93	9031.118	20.699	1.07	2.24	18.1	7.7	69.9	4.3	0.0	18.1	247.898	5.3	2.4			13.0	56.9	35.89	43.719	4.2	0.1	1.87	123.095	94.434	5.133	0.1
openjpeg	CPU	36.594	4.52	0.00	43506.24	14594.934	3372.996	1.87	0.08	31.3	46.6	16.9	5.2	0.1	31.2	12.522	35.7	10.8			2.6	14.2	18.50	33.427	5.1	0.1	10.74	169.925	146.277	1.298	0.1
openradioss	CPU	531.485	7.42	0.00	1376.43	232.018	14.423	2.78	1.94	46.7	11.6	40.5	1.1	0.0	46.7	267.046	7.0	4.6			14.1	26.5	13.24	33.454	1.0	0.1	0.54	121.437	90.500	5.862	0.1
openscad	CPU	422.749	0.80	0.62	34.55	5673.262	79.441	3.40	0.23	56.9	21.8	17.3	4.0	0.1	56.7	4.827	11.3	10.5			2.5	14.8	12.78	5.506	3.8	0.2	0.47	195.758	129.287	13.049	0.0
openssl	CPU	3546.361	15.43	0.00	10.81	4.335	8.676	1.49	3.96	42.2	6.2	51.6	0.0	0.6	26.3	148.491	0.7	3.2			29.0	3.9	0.08	1.475	0.0	0.0	0.01	84.214	59.897	4.113	36.3
openvino	CPU	3666.089	14.20	176.53	122.80	80.330	768.659	0.81	3.64	18.2	6.3	75.3	0.2	0.0	14.1	36.633	3.9	1.1			39.8	19.0	11.12	105.906	0.2	0.0	2.39	35.074	24.475	1.925	21.9
openvkl	CPU	3971.149	14.34	252.55	2027.71	833.509	604.793	0.85	3.72	27.1	12.2	59.2	1.5	0.1	21.9	375.847	8.8	1.1			18.8	29.3	24.18	17.629	1.0	0.2	1.04	97.035	64.496	6.877	18.8
ospray	CPU	1971.476	13.80	1.39	1.17	311.945	65.605	0.55	3.68	26.0	28.0	44.3	1.8	0.3	22.1	529.504	21.9	2.2			19.2	18.9	6.06	33.924	1.0	0.6	1.60	84.375	53.297	7.271	13.9
ospray-studio	CPU	8800.389	13.66	0.00	1.04	1449.482	19.107	0.66	3.58	21.5	16.3	59.2	3.0	0.1	18.2	286.846	12.0	1.9			18.9	31.4	12.48	81.419	2.5	0.0	2.72	128.286	111.713	3.300	15.0
paraview	skipped GPU
phpbench	CPU	71.058	0.80	0.00	178.22	2661.942	45.297	3.40	0.23	58.5	23.4	17.4	0.6	0.1	58.4	10.487	9.0	14.4			5.5	11.9	11.90	0.605	0.5	0.1	0.06	233.593	154.243	27.116	0.0
pybench	CPU	63.702	0.78	0.00	198.55	3226.203	51.478	4.63	0.22	74.6	10.3	13.0	2.2	0.2	74.4	7.418	4.5	5.7			2.7	10.3	5.98	1.063	1.8	0.4	0.14	186.412	147.743	18.095	0.0
pytorch	CPU	888.295	7.25			4403.504	20.239	0.74	2.01	13.6	6.1	78.6	1.6	0.1	12.8	42.240	3.2	2.5			40.6	33.7	19.66	102.068	1.4	0.1	2.55	47.300	40.086	1.367	5.6
quake2rtx	skipped GPU
quantlib	CPU	239.451	8.66	0.00	54.96	1421.749	16.002	2.29	2.12	63.0	9.6	26.3	1.1	0.3	39.0	131.654	4.1	1.9			11.4	5.0	6.74	13.313	0.5	0.1	0.18	137.454	96.251	15.916	37.6
rawtherapee	CPU	179.887	8.20	0.00	435.47	44459.568	84.950	1.26	2.09	29.9	12.6	55.7	1.8	0.1	21.8	338.719	5.7	3.5			20.7	20.1	23.61	17.136	1.3	0.0	1.81	120.345	86.317	6.059	26.8
redis	skipped AMD
rocksdb	CPU	1256.218	10.86	0.98	97087.39	501.311	9288.360	1.03	2.68	22.5	41.6	35.0	0.9	0.1	18.5	22.432	26.4	7.9			3.4	25.5	12.41	42.592	0.7	0.0	2.70	188.623	103.731	20.648	17.4
securemark	CPU	860.106	0.98	0.00	14.61	179.229	12.654	4.39	0.29	76.2	5.2	17.0	1.6	0.0	76.2	19.144	2.1	3.1			2.9	14.1	8.88	0.046	1.5	0.1	0.32	57.687	43.665	1.141	0.0
selenium	CPU	244.145	0.87	0.00	2396.54	10108.113	1416.580	2.89	0.24	46.5	18.1	33.1	2.4	0.1	46.3	17.897	11.0	7.0			7.6	25.5	14.32	16.964	2.2	0.1	0.42	202.159	160.984	9.559	0.2
simdjson	CPU	1167.221	0.93	0.00	11.58	3904.818	11.939	3.18	0.25	48.6	12.2	36.3	2.9	0.3	48.4	61.757	4.9	7.3			19.9	16.4	32.20	48.096	2.8	0.1	0.42	208.543	199.873	0.143	0.0
smhasher	CPU	1566.574	0.90	0.00	9.77	131.828	10.883	4.18	0.26	69.6	1.7	27.6	1.2	0.0	69.6	65.883	1.2	0.5			9.3	18.2	2.16	0.271	0.6	0.5	0.14	54.321	50.715	0.905	0.0
specviewperf2020	skipped GPU
speedb	CPU	2070.821	6.77	0.11	61348.93	2517.439	11240.701	1.08	2.51	24.2	45.1	29.9	0.9	0.1	19.5	25.198	28.1	8.4			3.0	21.2	11.53	40.327	0.7	0.0	3.00	191.064	103.573	20.252	18.9
stargate	CPU	1726.242	6.11	0.02	3222.76	493.612	348.099	3.16	1.57	56.3	5.0	33.5	5.2	0.2	56.0	300.547	2.6	2.4			25.7	7.7	3.11	34.828	4.5	0.7	0.31	164.443	127.675	6.565	0.2
svt-av1	CPU	460.335	8.07	3517.02	32.57	6943.285	1201.043	1.55	1.95	35.8	22.2	40.2	1.8	0.5	26.6	80.541	10.4	6.4			9.7	20.7	11.37	56.238	1.3	0.0	1.87	76.135	60.312	1.909	24.4
svt-vp9	CPU	266.392	6.69	2.25	53.24	8385.169	1559.226	1.22	1.70	26.7	19.1	52.6	1.5	0.6	21.1	127.789	8.1	7.4			16.2	26.6	12.79	61.126	1.2	0.0	1.67	76.023	58.190	2.432	18.7
tensorflow	CPU	863.009	14.49			883.876	285.828	0.53	3.87	10.4	2.1	87.4	0.0	0.0	8.9	37.224	1.5	0.3			32.3	42.3	11.50	134.834	0.0	0.0	0.92	20.641	18.959	0.117	14.7
tesseract	skipped GPU
tscp	CPU	118.511	0.23	0.00	111.25	5745.231	103.630	2.36	0.05	35.9	51.8	4.6	7.7	0.0	35.9	18.594	27.0	24.8			1.5	3.2	18.56	1.713	7.6	0.0	2.01	256.869	216.371	4.200	0.0
unigine-heaven	skipped GPU
unigine-super	skipped GPU
unigine-valley	skipped GPU
unvanquished	skipped GPU
uvg266	CPU	1226.459	12.63	0.00	18.34	911.448	2119.737	1.89	3.02	50.0	23.5	23.4	3.1	0.1	33.4	123.040	9.6	6.1			5.4	10.2	5.14	39.285	2.0	0.1	1.86	84.697	65.814	4.252	33.1
vkpeak	CPU	553.514	0.00	0.00	23.23	65655.240	2001.036	1.68	0.00	29.2	48.7	17.0	5.0	0.1	29.1	16.687	41.2	7.5			2.6	14.4	20.49	37.128	5.0	0.1	4.04	193.615	127.739	5.873	0.1
v-ray	CPU	240.446	12.40	0.00	50757.08	2689.256	85.281	1.02	3.06	23.5	27.9	43.1	5.5	0.2	17.8	481.211	15.0	6.3			9.8	23.1	8.88	81.869	4.2	0.0	3.78	102.473	71.528	8.119	23.6
vvenc	CPU	910.226	11.83	4.28	14.59	2108.822	217.272	1.73	2.81	40.2	19.8	38.0	2.0	0.4	28.8	178.948	9.1	5.2			8.1	19.4	8.78	55.784	1.4	0.1	1.39	95.819	82.428	1.977	27.5
warsow	skipped GPU
webp	CPU	259.402	1.02	0.00	926.62	12810.515	17.098	2.23	0.29	37.9	17.2	33.4	11.5	0.1	37.8	125.830	9.9	7.3			8.9	24.5	33.34	16.480	11.2	0.3	3.07	151.980	125.468	4.181	0.0
whisper.cpp	CPU	5530.727	7.60	21.96	7.15	106.438	10.040	2.11	1.93	32.4	3.0	64.2	0.4	0.2	32.2	66.991	1.7	1.4			15.6	48.6	39.15	78.098	0.3	0.1	0.09	117.442	116.844	0.123	0.0
wireguard	CPU	648.250	1.12	0.00	20.79	331.202	48245.803	0.76	0.29	15.8	71.0	12.7	0.5	0.1	15.2	16.385	58.3	10.6			2.6	9.7	12.49	108.288	0.5	0.0	11.27	204.464	103.374	2.485	2.9
x265	CPU	713.905	8.03			1067.827	528.764	1.55	1.98	35.4	14.0	47.8	2.7	0.4	27.0	190.915	6.9	3.9			11.3	25.7	12.06	39.677	1.9	0.2	1.69	64.254	40.459	5.170	22.6
xmrig	CPU	7633.337	15.73	0.00	1.93	77.426	9.515	0.37	4.46	10.5	5.1	83.2	1.3	0.0	6.8	307.280	1.6	1.7			4.9	49.2	33.38	85.166	0.4	0.4	1.90	39.620	38.196	0.189	34.9
xonotic	skipped GPU
xplane12	skipped GPU
y-cruncher	CPU	679.121	14.72	1885.45	43141.61	18.434	84.876	0.83	3.71	17.2	3.0	79.6	0.2	0.0	14.7	22.845	2.1	0.5			30.0	37.9	11.75	106.431	0.1	0.1	0.62	22.662	17.487	0.298	14.6
z3	CPU	289.434	0.89	0.00	6.99	7811.230	18.995	1.09	0.28	20.4	19.5	55.1	5.1	0.0	20.3	10.992	5.7	13.7			1.7	53.3	12.08	145.911	4.8	0.3	1.21	203.600	154.331	4.386	0.0

phoronix – January 2024

Posted on February 2, 2024 by mevFebruary 2, 2024

Phoronix has published its roundup of benchmark/performance/review articles – https://www.phoronix.com/news/January-2024-Highlights Included were 10 articles with reviews and benchmarks. I’ve been keeping up with CPU workloads listed and now >130 workloads total. I haven’t added GPU/graphics tests because I haven’t developed … Continue reading →

New website

Posted on February 21, 2023 by mevFebruary 21, 2023

Back in 2018, I set up a website at perf.mvermeulen.com to document my explorations of performance topics. This website is continuing that tradition but providing a new location including using the central administration and https certificate from mvermeulen.org. Otherwise I … Continue reading →

Performance analysis, tools and experiments

An eclectic collection

Category Archives: website

Histograms

Adding summary statistics for all benchmarks

Creating a more automated performance table

phoronix – January 2024

New website