Performance
Measured with python/bench_certificate.py. Run after maturin develop --release or
maturin build --release --interpreter python3 and installing the wheel.
The benchmark uses Criterion’s Linear sampling mode (100 samples, linearly
increasing iteration counts; see python/criterion_compat.py), so timings are
directly comparable to the Rust Criterion benchmarks in synta-bench.
Note: Always verify the installed
.sois a release build aftermaturin develop --release— the file size should be ≈1.4 MB (release) not ≈20 MB (debug). If in doubt, copytarget/release/lib_synta.somanually topython/synta/_synta.abi3.so.
Parse-only (call only, no field access)
Two Python parse-only modes are benchmarked, corresponding to the two Rust variants:
| Certificate set | synta lazy (from_der) | synta_full eager (full_from_der) | cryptography.x509 | Rust rust_shallow | Rust rust_typed |
|---|---|---|---|---|---|
| Traditional (~900 B, 5 certs) | 0.16 µs | 0.72–0.76 µs | 1.62–1.68 µs | 0.019–0.020 µs | 0.50–0.51 µs |
| ML-DSA-44 (3,992 B) | 0.17 µs | 0.74 µs | 1.55 µs | 0.020 µs | 0.54 µs |
| ML-DSA-65 (5,521 B) | 0.17 µs | 0.73 µs | 1.55 µs | 0.019 µs | 0.49 µs |
| ML-DSA-87 (7,479 B) | 0.17 µs | 0.73 µs | 1.54 µs | 0.020 µs | 0.49 µs |
Comparison note: synta (from_der) and Rust rust_shallow perform the same
4-operation envelope scan. The ~0.14 µs gap between them (0.16 µs vs 0.019 µs) is
pure PyO3 overhead: GIL acquisition, Py<PyBytes> setup, and constructing a
PyCertificate struct with 17 OnceLock fields. synta_full (full_from_der) and
Rust rust_typed both do a complete RFC 5280 decode; the ~0.23 µs gap is the same
PyO3 overhead amortised over the full parse.
For parse-only workloads the honest comparison with cryptography.x509 (also lazy) is
synta vs cryptography.x509 — ~10× faster for traditional certs, ~9× faster
for ML-DSA certs. For full-decode workloads the apples-to-apples comparison is
synta_full vs cryptography.x509 — ~2.3× faster.
from_der holds a Py<PyBytes> reference to the caller’s bytes object — no copy of
the DER data. Parse time is flat across ML-DSA cert sizes: the large signature BIT
STRING is not decoded at all during the shallow scan.
Parse + access all fields (cold cache — fresh cert each iteration)
| Certificate set | synta (PyO3, 19 fields) | cryptography.x509 (9 fields) |
|---|---|---|
| Traditional (~900 B, 5 certs) | 3.30–3.42 µs | 15.90–16.29 µs |
| ML-DSA-44 (3,992 B) | 3.71 µs | 13.41 µs |
| ML-DSA-65 (5,521 B) | 3.63 µs | 13.14 µs |
| ML-DSA-87 (7,479 B) | 3.71 µs | 13.38 µs |
synta is ~4.8× faster than cryptography.x509 for traditional certs (parse+fields,
19 vs 9 fields). ML-DSA certs measure ~3.6× faster — the LAMPS WG test certs are
simpler (fewer extensions) than the NIST PKITS certs, so cryptography.x509 processes
fewer extension fields, reducing its parse+fields time. Requires cryptography ≥ 44.0
(FIPS 204) for ML-DSA public_key() decoding; older versions raise UnsupportedAlgorithm.
Field access only (warm cache — same cert, repeated access)
| Certificate set | synta (PyO3, 19 fields) | cryptography.x509 stock | cryptography.x509 perf-opt |
|---|---|---|---|
| Traditional (~900 B, 5 certs) | 0.41–0.43 µs | ~14.8 µs | 0.91–0.95 µs |
| ML-DSA-44 (3,992 B) | 0.41 µs | ~11.2 µs | 2.21–2.28 µs |
| ML-DSA-65 (5,521 B) | 0.41 µs | ~11.2 µs | 2.21–2.28 µs |
| ML-DSA-87 (7,479 B) | 0.42 µs | ~11.1 µs | 2.21–2.28 µs |
synta warm-cache access (~0.42 µs) is 19 clone_ref calls — essentially free per field.
Stock cryptography.x509 memoises only extensions (via PyOnceLock). The other
8 fields (subject, issuer, serial_number, not_valid_before_utc,
not_valid_after_utc, signature, signature_hash_algorithm, public_key) are
re-derived from the zero-copy in-memory ASN.1 structure on every Python access, each
allocating a new Python object. Warm and cold times are therefore close (~11–15 µs).
cryptography PR #14441 adds
OnceLock caching for issuer, subject, public_key, and signature_algorithm on
Certificate, and caching on CertificationRequest, CertificateList, and
OCSPResponse. This brings traditional warm access from ~14.8 µs down to
0.91–0.95 µs — a ~16× improvement. synta remains ~2.2× faster for traditional
warm access and ~5× faster for ML-DSA (larger key objects have higher PyBytes copy
cost even on the cached path). Parse-only and cold parse+fields times are unaffected by
the caching changes.
PKCS#7 and PKCS#12 extraction
Measured with python/bench_pkcs.py. Benchmark IDs match the Rust Criterion IDs in
synta-bench/benches/pkcs_formats.rs exactly (pkcs7/{synta,cryptography}/value,
pkcs12/{synta,cryptography}/value).
| Benchmark ID | Input | synta | cryptography | Speedup |
|---|---|---|---|---|
pkcs7/…/amazon_roots | 1,848 B DER/BER, 2 certs | 0.82 µs (Rust) / 1.27 µs (Py) | 41.7 µs | ~33× |
pkcs7/…/pem_isrg | 1,992 B PEM, 1 cert | 3.58 µs (Rust) / 4.14 µs (Py) | 32.1 µs | ~8× |
pkcs12/…/unencrypted_3certs | 3,539 B, 3 certs | 1.10 µs (Rust) / 1.80 µs (Py) | 134 µs | ~75× |
pkcs12/…/unencrypted_1cert_with_key | 756 B, 1 cert + key | 639 ns (Rust) / 0.96 µs (Py) | — | — |
The unencrypted_1cert_with_key vector (cert-none-key-none.p12) uses a non-standard
format that cryptography cannot parse; it is benchmarked as synta-only to exercise
the key-bag-skipping code path.
The ~0.5–0.7 µs gap between Rust and Python times is the usual PyO3 overhead: GIL
acquisition, PyBytes allocation for each returned certificate, and PyList construction.
The pem_isrg Python time (4.14 µs) is higher than pure PKCS#7 parse (1.27 µs) because
PEM decoding and base64 allocation are included in the timed loop.
cryptography comparison note: The amazon_roots DER benchmark triggers a
UserWarning from cryptography — the file uses BER indefinite-length encoding
(0x30 0x80…), which cryptography handles via an internal fallback with a deprecation
warning. synta accepts BER transparently with no warning.
Architecture notes
Parse-only: synta.Certificate.from_der() holds a Py<PyBytes> strong
reference to the caller’s bytes object (no copy). It performs only a 4-operation shallow
envelope scan (outer SEQUENCE tag+length, TBSCertificate SEQUENCE tag+length) — equivalent
to synta_certificate::validate_envelope() in Rust — and records the TBS byte range.
The full recursive Certificate::decode() is deferred to the first getter call.
synta.Certificate.full_from_der() performs the same shallow scan and then immediately
triggers the full decode, so all 19 field caches are warm before any Python code accesses them.
Parse+fields: All 19 getters cache their Python object in an
OnceLock<Py<T>>. String-returning getters (issuer, subject, signature_algorithm,
signature_algorithm_oid, public_key_algorithm, public_key_algorithm_oid, not_before,
not_after) store Py<PyString>. Bytes-returning getters (signature_value, public_key,
tbs_bytes, issuer_raw_der, subject_raw_der) store Py<PyBytes>. Optional getters
(signature_algorithm_params, public_key_algorithm_params, extensions_der) store
Option<Py<PyBytes>>. to_der skips the lock entirely — direct clone_ref of the
stored Py<PyBytes> that was passed to from_der.
#[pyclass(frozen)]: PyCertificate is declared frozen, which removes the
8-byte PyO3 borrow-tracking field (borrow_flag: Cell<isize>) from every instance
and eliminates per-getter Cell::get/set calls. This gives a ~26% speedup on the
warm-path field-access benchmark.
ML-DSA parse+fields: The first-call PyBytes copies of signature_value
(2,420–4,627 bytes) and public_key (1,312–1,952 bytes) add ~0.3 µs over traditional
certs (3.63–3.71 µs vs 3.30–3.42 µs). Subsequent accesses use clone_ref at
~0.41 µs total regardless of cert size.
For the full binding-layer comparison (Rust rust_shallow / rust_typed / rust_element /
c_ffi / Python synta / synta_full / cryptography_x509, parse-only and parse+fields)
see docs/performance.md.
See also Development for how to run benchmarks.