+++ to secure your transactions use the Bitcoin Mixer Service +++

eBACS: ECRYPT Benchmarking of Cryptographic Systems

ECRYPT II

General information:

SUPERCOP

How to submit new software:

Hash functions

Stream ciphers

DH functions

Public-key encryption

Public-key signatures

List of primitives measured:

Public-key encryption

Public-key signatures

Measurements indexed by machine:

Public-key encryption

Public-key signatures

SUPERCOP

SUPERCOP is a toolkit developed by the VAMPIRE lab for measuring the performance of cryptographic software. SUPERCOP stands for System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives; the name was suggested by Paul Bakker.

The latest release of SUPERCOP measures the performance of hash functions, secret-key stream ciphers, public-key encryption systems, public-key signature systems, and public-key secret-sharing systems. SUPERCOP integrates and improves upon

STVL's benchmarking suite for stream ciphers submitted to eSTREAM, the ECRYPT Stream Cipher Project (which finished in April 2008);
VAMPIRE's BATMAN (Benchmarking of Asymmetric Tools on Multiple Architectures, Non-interactively) suite for public-key systems submitted to the eBATS (ECRYPT Benchmarking of Asymmetric Systems) project; and
additional tools developed for VAMPIRE's new eBASH (ECRYPT Benchmarking of All Submitted Hashes) project.

Specifically, SUPERCOP measures cryptographic primitives according to several criteria:

Time to hash a very short packet of data.
Time to hash a typical-size Internet packet.
Time to hash a long message.
Length of the hash output.
Time to encrypt a very short packet of data using a secret key and a nonce.
Time to encrypt a typical-size Internet packet.
Time to encrypt a long message.
Length of the secret key.
Length of the nonce.
Time to generate a key pair (a private key and a corresponding public key).
Length of the private key.
Length of the public key.
Time to generate a shared secret from a private key and another user's public key.
Length of the shared secret.
Time to encrypt a message using a public key.
Length of the encrypted message.
Time to decrypt a message using a private key.
Time to sign a message using a private key.
Length of the signed message.
Time to verify a signed message using a public key.

"Time" refers to time on real computers: time on an Intel Core 2 Quad, time on an AMD Athlon 64 X2, time on an IBM PowerPC G5 970, etc. The point of these cost measures is that they are directly visible to the cryptographic user.

Contributing computer time to benchmarking

Do you have a computer that has enough time to benchmark all the available cryptographic software, that has no other tasks consuming CPU power, and that will have time in the future for updated benchmarks? Would you like to contribute CPU cycles to benchmarking? Perhaps your favorite type of computer isn't included in the current list of benchmarking platforms. Even if all of your computers are similar to computers in the list, you can help by providing independent verification of the speed measurements.

To collect measurements, simply download, unpack, and run SUPERCOP:

     wget http://hyperelliptic.org/ebats/supercop-20141124.tar.bz2
     bunzip2 < supercop-20141124.tar.bz2 | tar -xf -
     cd supercop-20141124
     nohup sh do &

Put the resulting supercop-20141124/bench/*/data.gz file on the web, and send the URL to the eBACS/eBATS/eBASC/eBASH mailing list.

Multiple computers that share filesystems can run SUPERCOP in the same directory. Each computer will create its own subdirectory of bench, labelled by the computer's name, and will perform all work inside that subdirectory.

Alternative: Incremental benchmarks

Here is a different method of collecting measurements:

     wget http://hyperelliptic.org/ebats/supercop-20141124.tar.bz2
     bunzip2 < supercop-20141124.tar.bz2 | tar -xf -
     cd supercop-20141124
     nohup sh data-do &

Put the resulting supercop-data/*/data.gz file on the web, and send the URL to the eBACS/eBATS/eBASC/eBASH mailing list.

The disadvantage of this method is that it consumes extra disk space (typically 20 gigabytes or more, and many inodes). The big advantage of this method is incrementality: an updated version of SUPERCOP will automatically reuse most of the work from the supercop-20141124 run, benchmarking only new code and changed code, so the new benchmark run will finish much more quickly. (However, if an OS update has changed the compiler version, everything will be automatically re-benchmarked.)

Another advantage of this method is parallelizability: on (e.g.) a 4-core machine you can run

     nohup sh data-do 4 &

to finish the benchmarks almost 4 times as quickly.

Reducing randomness in benchmarks

There are many random effects that can make identical computations take variable amounts of time on the same machine.

To detect randomness, SUPERCOP runs each computation several times within the measuring program, runs the measuring program several times, and records all of the resulting measurements. Medians and quartiles are reported on the web pages, and any severe discrepancies are flagged in red. (There are a few cryptographic operations whose running time is intrinsically random; RSA key generation is the classic example. In theory SUPERCOP could perform these computations enough times to see that their time follows a stable distribution, and could then remove the red flags.)

You can take several steps to reduce randomness:

Turn off hyperthreading. On typical Linux machines hyperthreading is indicated by dashes or commas in the files in /sys/devices/system/cpu/cpu*/topology/thread_siblings_list: for example, if cpu2 and cpu6 both say 2,6 then they are actually one physical core running two threads. If the kernel is new enough to support "CPU hotplugging" then the system administrator can type (e.g.)
```
     echo 0 > /sys/devices/system/cpu/cpu6/online
```
to turn off cpu6. This effect disappears at the next boot; for a longer-lasting effect the system administrator can disable hyperthreading in the BIOS.
Turn off overclocking (e.g., Turbo Boost) and underclocking. On typical Linux machines the system administrator can set a fixed frequency of (e.g.) 2GHz by typing
```
     for i in /sys/devices/system/cpu/cpu*/cpufreq
     do
       echo userspace > $i/scaling_governor
       echo 2000000 > $i/scaling_setspeed
     done
```
You can see the available frequencies listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies. When the top two frequencies are separated by just 1000, the larger is Turbo Boost; use the smaller. These effects also disappear at the next boot; for a longer-lasting effect the system administrator can disable overclocking and underclocking in the BIOS.
Run benchmarks while the computer is idle. This is much less important for SUPERCOP than it is for most benchmarking tools: a typical cryptographic operation measured by SUPERCOP completes much more quickly than an operating-system clock tick, and medians filter out occasional interruptions. However, reliably measuring very long operations requires the computer to be idle.

Some machines have high-precision cycle counters that can only be enabled by the kernel and that are disabled by default. On these machines you can improve benchmark quality by enabling the cycle counters. Details depend on the machine:

Database format

The output of SUPERCOP is an extensive database of measurements in a form suitable for easy computer processing. The database is currently stored as a separate compressed data.gz file for each machine. Version 20100702 of SUPERCOP, on a typical 2.4GHz Core 2 (two architectures, amd64 and x86, but with only 64-bit OpenSSL), produces a 94-megabyte data.gz that uncompresses to 734 megabytes.

The database, in uncompressed form, consists of a series of database entries. Each database entry is a line consisting of the following space-separated words:

SUPERCOP version; e.g., 20100702.
Computer name; e.g., utrecht. There is a separate page providing more information about the computers: e.g., utrecht's CPU is a 2400MHz Intel Core 2 Quad Q6600 (6fb). Benchmarks on multiple-CPU machines use just one CPU, and benchmarks on multiple-core CPUs use just one core.
Application Binary Interface (ABI); e.g., amd64. On a computer that supports multiple incompatible ABIs (e.g., 32-bit x86 and 64-bit amd64), SUPERCOP automatically collects separate measurements for each ABI. Beware that the ABI names are not standardized.
Benchmark start date; e.g., 20100703.
Operation (type of primitive) measured; e.g., crypto_hash.
Primitive measured; e.g., sha256.
Additional words giving details of the measurements.

There are also database entries whose first word is a plus sign. These entries are meant for human consumption and are not in a documented format.

SUPERCOP automatically tries all available implementations of each primitive, and many compilers for each implementation, to select the fastest combination of implementation and compiler. Each try produces a database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word try.
A checksum of various outputs of the implementation; e.g., 86df8bd202b2a2b5fdc04a7f50a591e43a345849c12fef08d487109648a08e05.
The word ok if the checksum is correct, or fails if the checksum is incorrect, or unknown if the correct checksum is not known. An implementation+compiler combination that produces an incorrect checksum is skipped.
The number of cycles used for a typical cryptographic operation; e.g., 35289. This is actually the median of many measurements. SUPERCOP selects the implementation+compiler combination that minimizes this number. For example, for hash functions, SUPERCOP selects the implementation+compiler combination that minimizes the time to hash 1536 bytes.
The number of cycles used for computing the checksum; e.g., 220716990.
The number of cycles per second; e.g., 2405453000.
The implementation used; e.g., crypto_hash/sha256/openssl.
The compiler used; e.g., gcc_-m64_-march=k8_-O3_-fomit-frame-pointer.

If a compiler issues an error message (or a warning or any other output), SUPERCOP produces a database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word fromcompiler.
The implementation used; e.g., crypto_hash/shavite3512/lower-mem.
The compiler used; e.g., gcc_-march=nocona_-Os_-fomit-frame-pointer.
The file being compiled; e.g., SHAvite3.c.
One or more words of output repeating the error message: e.g., portable.h:109:2: warning: #warning NEITHER NESSIE_LITTLE_ENDIAN NOR NESSIE_BIG_ENDIAN ARE DEFINED!!!!! Several error messages will produce several database entries (in the same order).

If an implementation fails to run (for example, because it uses machine instructions not supported by the CPU), SUPERCOP produces a database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word tryfails.
The implementation used; e.g., crypto_hash/fugue256/SSE4.1.
The compiler used; e.g., gcc_-m64_-march=core2_-msse4_-O3_-fomit-frame-pointer.
One or more words of output describing the failure; e.g., Illegal instruction. Several lines of output will produce several database entries (in the same order).

SUPERCOP then measures the performance of the selected implementation and compiler on a wider variety of specific operations; for example, hash functions are selected on the basis of 1536-byte hashing, but are then measured for hashing 0 bytes, 1 byte, 2 bytes, 3 bytes, etc. Each specific operation produces a database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
One of the following words:
- cycles: generating a shared secret (crypto_dh), hashing (crypto_hash), encrypting a message under a public key (crypto_encrypt), signing a message (crypto_sign), or generating a stream from a secret key (crypto_stream);
- afternm_cycles: currently undocumented;
- base_cycles: currently undocumented;
- beforenm_cycles: currently undocumented;
- keypair_cycles: generating a key pair (crypto_dh_keypair or crypto_encrypt_keypair or crypto_sign_keypair);
- forgery_open_afternm_cycles: currently undocumented;
- forgery_open_cycles: currently undocumented;
- open_afternm_cycles: currently undocumented;
- open_cycles: decrypting a message (crypto_encrypt_open) or verifying a signed message (crypto_sign_open);
- verify_cycles: currently undocumented;
- xor_cycles: encrypting a message under a secret key (crypto_stream_xor);
- bytes: the length of an encrypted or signed message;
- beforenmbytes: currently undocumented;
- boxzerobytes: currently undocumented;
- bssbytes: the number of bytes in the bss section;
- constbytes: currently undocumented;
- databytes: the number of bytes in the data section;
- inputbytes: currently undocumented;
- keybytes: currently undocumented;
- noncebytes: currently undocumented;
- open_bytes: the length of a decrypted or verified message;
- outputbytes: currently undocumented;
- publickeybytes: currently undocumented;
- scalarbytes: currently undocumented;
- secretkeybytes: currently undocumented;
- statebytes: currently undocumented;
- textbytes: the number of bytes in the text section;
- zerobytes: currently undocumented.
The number of original message bytes hashed, encrypted, etc.; e.g., 96.
The median of many successive measurements; e.g., 3159.
The first measurement; e.g., 4131.
The second measurement; e.g., 3222.
...
The last measurement; e.g., 3159.

SUPERCOP also records the selected implementation in a separate database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word implementation.
The implementation used; e.g., crypto_hash/sha256/openssl.
The implementation version (the CRYPTO_VERSION macro) or - if no version number is defined by the implementation.

SUPERCOP also records the selected compiler in a separate database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word compiler.
The compiler used; e.g., g++_-m64_-march=nocona_-O2_-fomit-frame-pointer.
The compiler version; e.g., 4.3.3.

SUPERCOP also records the CPU identifier in a separate database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word cpuid.
The CPU identifier; e.g., GenuineIntel-000006fb-bfebfbff_.

SUPERCOP also records the number of CPU cycles per second in a separate database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word cpucycles_persecond.
The number of CPU cycles per second; e.g., 2394000000.

SUPERCOP also records the cycle-counting mechanism in a separate database entry with the following words:

SUPERCOP version.
Computer name.
ABI.
Benchmark start date.
Operation measured.
Primitive measured.
The word cpucycles_implementation.
The mechanism; e.g., amd64cpuinfo.

Version

This is version 2014.11.24 of the supercop.html web page. This web page is in the public domain.