SUPERCOP
SUPERCOP
is a toolkit developed by the VAMPIRE lab
for measuring the performance of cryptographic software.
SUPERCOP stands for
System for
Unified
Performance
Evaluation
Related to
Cryptographic
Operations
and Primitives;
the name was suggested by Paul Bakker.
The latest release of SUPERCOP
measures the performance of hash functions,
secret-key stream ciphers,
public-key encryption systems,
public-key signature systems,
and public-key secret-sharing systems.
SUPERCOP integrates and improves upon
- STVL's benchmarking suite for stream ciphers submitted to eSTREAM,
the ECRYPT Stream Cipher Project (which finished in April 2008);
- VAMPIRE's BATMAN
(Benchmarking of Asymmetric Tools
on Multiple Architectures,
Non-interactively) suite
for public-key systems submitted to the eBATS
(ECRYPT Benchmarking
of Asymmetric Systems) project;
and
- additional tools developed for VAMPIRE's new eBASH
(ECRYPT Benchmarking
of All Submitted Hashes) project.
Specifically, SUPERCOP measures cryptographic primitives according to several criteria:
- Time to hash a very short packet of data.
- Time to hash a typical-size Internet packet.
- Time to hash a long message.
- Length of the hash output.
- Time to encrypt a very short packet of data using a secret key and a nonce.
- Time to encrypt a typical-size Internet packet.
- Time to encrypt a long message.
- Length of the secret key.
- Length of the nonce.
- Time to generate a key pair (a private key and a corresponding public key).
- Length of the private key.
- Length of the public key.
- Time to generate a shared secret from a private key and another user's public key.
- Length of the shared secret.
- Time to encrypt a message using a public key.
- Length of the encrypted message.
- Time to decrypt a message using a private key.
- Time to sign a message using a private key.
- Length of the signed message.
- Time to verify a signed message using a public key.
"Time" refers to time on real computers:
time on an Intel Core 2 Quad,
time on an AMD Athlon 64 X2,
time on an IBM PowerPC G5 970,
etc.
The point of these cost measures
is that they are directly visible to the cryptographic user.
Contributing computer time to benchmarking
Do you have a computer
that has enough time to benchmark all the available cryptographic software,
that has no other tasks consuming CPU power,
and that will have time in the future for updated benchmarks?
Would you like to contribute CPU cycles to benchmarking?
Perhaps your favorite type of computer isn't included in the
current list of benchmarking platforms.
Even if all of your computers are similar to computers in the list,
you can help by providing independent verification of the speed measurements.
To collect measurements,
simply download, unpack, and run SUPERCOP:
wget http://hyperelliptic.org/ebats/supercop-20141124.tar.bz2
bunzip2 < supercop-20141124.tar.bz2 | tar -xf -
cd supercop-20141124
nohup sh do &
Put the resulting supercop-20141124/bench/*/data.gz file on the web,
and send the URL to the eBACS/eBATS/eBASC/eBASH mailing list.
Multiple computers that share filesystems
can run SUPERCOP in the same directory.
Each computer will create its own subdirectory of bench,
labelled by the computer's name,
and will perform all work inside that subdirectory.
Alternative: Incremental benchmarks
Here is a different method of collecting measurements:
wget http://hyperelliptic.org/ebats/supercop-20141124.tar.bz2
bunzip2 < supercop-20141124.tar.bz2 | tar -xf -
cd supercop-20141124
nohup sh data-do &
Put the resulting supercop-data/*/data.gz file on the web,
and send the URL to the eBACS/eBATS/eBASC/eBASH mailing list.
The disadvantage of this method is that it consumes extra disk space
(typically 20 gigabytes or more, and many inodes).
The big advantage of this method is incrementality:
an updated version of SUPERCOP
will automatically reuse
most of the work from the supercop-20141124 run,
benchmarking only new code and changed code,
so the new benchmark run will finish much more quickly.
(However, if an OS update has changed the compiler version,
everything will be automatically re-benchmarked.)
Another advantage of this method is parallelizability:
on (e.g.) a 4-core machine you can run
nohup sh data-do 4 &
to finish the benchmarks almost 4 times as quickly.
Reducing randomness in benchmarks
There are many random effects that can make identical computations
take variable amounts of time on the same machine.
To detect randomness,
SUPERCOP runs each computation several times within the measuring program,
runs the measuring program several times,
and records all of the resulting measurements.
Medians and quartiles are reported on the web pages,
and any severe discrepancies are flagged in red.
(There are a few cryptographic operations
whose running time is intrinsically random;
RSA key generation is the classic example.
In theory SUPERCOP could perform these computations enough times
to see that their time follows a stable distribution,
and could then remove the red flags.)
You can take several steps to reduce randomness:
Some machines have high-precision cycle counters
that can only be enabled by the kernel
and that are disabled by default.
On these machines you can improve benchmark quality by enabling the cycle counters.
Details depend on the machine:
Database format
The output of SUPERCOP is an extensive database of measurements
in a form suitable for easy computer processing.
The database is currently stored as a separate compressed data.gz file for each machine.
Version 20100702 of SUPERCOP,
on a typical 2.4GHz Core 2 (two architectures, amd64 and x86, but with only 64-bit OpenSSL),
produces a 94-megabyte data.gz
that uncompresses to 734 megabytes.
The database, in uncompressed form, consists of a series of database entries.
Each database entry is a line consisting of the following space-separated words:
- SUPERCOP version; e.g., 20100702.
- Computer name; e.g., utrecht.
There is a
separate page providing more information about the computers:
e.g., utrecht's CPU is a 2400MHz Intel Core 2 Quad Q6600 (6fb).
Benchmarks on multiple-CPU machines use just one CPU,
and benchmarks on multiple-core CPUs use just one core.
- Application Binary Interface (ABI); e.g., amd64.
On a computer that supports multiple incompatible ABIs
(e.g., 32-bit x86 and 64-bit amd64),
SUPERCOP automatically collects separate measurements for each ABI.
Beware that the ABI names are not standardized.
- Benchmark start date; e.g., 20100703.
- Operation (type of primitive) measured; e.g., crypto_hash.
- Primitive measured; e.g., sha256.
- Additional words giving details of the measurements.
There are also database entries whose first word is a plus sign.
These entries are meant for human consumption
and are not in a documented format.
SUPERCOP automatically tries all available implementations of each primitive,
and many compilers for each implementation,
to select the fastest combination of implementation and compiler.
Each try produces a database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word try.
- A checksum of various outputs of the implementation; e.g.,
86df8bd202b2a2b5fdc04a7f50a591e43a345849c12fef08d487109648a08e05.
- The word ok if the checksum is correct,
or fails if the checksum is incorrect,
or unknown if the correct checksum is not known.
An implementation+compiler combination that produces an incorrect checksum is skipped.
- The number of cycles used for a typical cryptographic operation; e.g., 35289.
This is actually the median of many measurements.
SUPERCOP selects the implementation+compiler combination that minimizes this number.
For example, for hash functions, SUPERCOP selects the implementation+compiler combination
that minimizes the time to hash 1536 bytes.
- The number of cycles used for computing the checksum; e.g., 220716990.
- The number of cycles per second; e.g., 2405453000.
- The implementation used; e.g., crypto_hash/sha256/openssl.
- The compiler used; e.g., gcc_-m64_-march=k8_-O3_-fomit-frame-pointer.
If a compiler issues an error message (or a warning or any other output),
SUPERCOP produces a database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word fromcompiler.
- The implementation used; e.g., crypto_hash/shavite3512/lower-mem.
- The compiler used; e.g., gcc_-march=nocona_-Os_-fomit-frame-pointer.
- The file being compiled; e.g., SHAvite3.c.
- One or more words of output repeating the error message:
e.g.,
portable.h:109:2: warning: #warning NEITHER NESSIE_LITTLE_ENDIAN NOR NESSIE_BIG_ENDIAN ARE DEFINED!!!!!
Several error messages will produce several database entries (in the same order).
If an implementation fails to run
(for example, because it uses machine instructions not supported by the CPU),
SUPERCOP produces a database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word tryfails.
- The implementation used; e.g., crypto_hash/fugue256/SSE4.1.
- The compiler used; e.g., gcc_-m64_-march=core2_-msse4_-O3_-fomit-frame-pointer.
- One or more words of output describing the failure;
e.g., Illegal instruction.
Several lines of output will produce several database entries (in the same order).
SUPERCOP then measures
the performance of the selected implementation and compiler
on a wider variety of specific operations;
for example, hash functions are selected on the basis of 1536-byte hashing,
but are then measured for hashing 0 bytes, 1 byte, 2 bytes, 3 bytes, etc.
Each specific operation produces a database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- One of the following words:
- cycles:
generating a shared secret (crypto_dh),
hashing (crypto_hash),
encrypting a message under a public key (crypto_encrypt),
signing a message (crypto_sign),
or generating a stream from a secret key (crypto_stream);
- afternm_cycles:
currently undocumented;
- base_cycles:
currently undocumented;
- beforenm_cycles:
currently undocumented;
- keypair_cycles:
generating a key pair (crypto_dh_keypair
or crypto_encrypt_keypair or crypto_sign_keypair);
- forgery_open_afternm_cycles:
currently undocumented;
- forgery_open_cycles:
currently undocumented;
- open_afternm_cycles:
currently undocumented;
- open_cycles:
decrypting a message (crypto_encrypt_open)
or verifying a signed message (crypto_sign_open);
- verify_cycles:
currently undocumented;
- xor_cycles:
encrypting a message under a secret key (crypto_stream_xor);
- bytes:
the length of an encrypted or signed message;
- beforenmbytes:
currently undocumented;
- boxzerobytes:
currently undocumented;
- bssbytes:
the number of bytes in the bss section;
- constbytes:
currently undocumented;
- databytes:
the number of bytes in the data section;
- inputbytes:
currently undocumented;
- keybytes:
currently undocumented;
- noncebytes:
currently undocumented;
- open_bytes:
the length of a decrypted or verified message;
- outputbytes:
currently undocumented;
- publickeybytes:
currently undocumented;
- scalarbytes:
currently undocumented;
- secretkeybytes:
currently undocumented;
- statebytes:
currently undocumented;
- textbytes:
the number of bytes in the text section;
- zerobytes:
currently undocumented.
- The number of original message bytes hashed, encrypted, etc.; e.g., 96.
- The median of many successive measurements; e.g., 3159.
- The first measurement; e.g., 4131.
- The second measurement; e.g., 3222.
- ...
- The last measurement; e.g., 3159.
SUPERCOP also records the selected implementation
in a separate database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word implementation.
- The implementation used; e.g., crypto_hash/sha256/openssl.
- The implementation version (the CRYPTO_VERSION macro)
or - if no version number is defined by the implementation.
SUPERCOP also records the selected compiler
in a separate database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word compiler.
- The compiler used; e.g., g++_-m64_-march=nocona_-O2_-fomit-frame-pointer.
- The compiler version; e.g., 4.3.3.
SUPERCOP also records the CPU identifier
in a separate database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word cpuid.
- The CPU identifier; e.g., GenuineIntel-000006fb-bfebfbff_.
SUPERCOP also records the number of CPU cycles per second
in a separate database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word cpucycles_persecond.
- The number of CPU cycles per second; e.g., 2394000000.
SUPERCOP also records the cycle-counting mechanism
in a separate database entry with the following words:
- SUPERCOP version.
- Computer name.
- ABI.
- Benchmark start date.
- Operation measured.
- Primitive measured.
- The word cpucycles_implementation.
- The mechanism; e.g., amd64cpuinfo.
|