Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

Measured with python/bench_certificate.py. Run after maturin develop --release or maturin build --release --interpreter python3 and installing the wheel. The benchmark uses Criterion’s Linear sampling mode (100 samples, linearly increasing iteration counts; see python/criterion_compat.py), so timings are directly comparable to the Rust Criterion benchmarks in synta-bench.

Note: Always verify the installed .so is a release build after maturin develop --release — the file size should be ≈1.4 MB (release) not ≈20 MB (debug). If in doubt, copy target/release/lib_synta.so manually to python/synta/_synta.abi3.so.

Parse-only (call only, no field access)

Two Python parse-only modes are benchmarked, corresponding to the two Rust variants:

Certificate setsynta lazy (from_der)synta_full eager (full_from_der)cryptography.x509Rust rust_shallowRust rust_typed
Traditional (~900 B, 5 certs)0.16 µs0.72–0.76 µs1.62–1.68 µs0.019–0.020 µs0.50–0.51 µs
ML-DSA-44 (3,992 B)0.17 µs0.74 µs1.55 µs0.020 µs0.54 µs
ML-DSA-65 (5,521 B)0.17 µs0.73 µs1.55 µs0.019 µs0.49 µs
ML-DSA-87 (7,479 B)0.17 µs0.73 µs1.54 µs0.020 µs0.49 µs

Comparison note: synta (from_der) and Rust rust_shallow perform the same 4-operation envelope scan. The ~0.14 µs gap between them (0.16 µs vs 0.019 µs) is pure PyO3 overhead: GIL acquisition, Py<PyBytes> setup, and constructing a PyCertificate struct with 17 OnceLock fields. synta_full (full_from_der) and Rust rust_typed both do a complete RFC 5280 decode; the ~0.23 µs gap is the same PyO3 overhead amortised over the full parse.

For parse-only workloads the honest comparison with cryptography.x509 (also lazy) is synta vs cryptography.x509~10× faster for traditional certs, ~9× faster for ML-DSA certs. For full-decode workloads the apples-to-apples comparison is synta_full vs cryptography.x509~2.3× faster.

from_der holds a Py<PyBytes> reference to the caller’s bytes object — no copy of the DER data. Parse time is flat across ML-DSA cert sizes: the large signature BIT STRING is not decoded at all during the shallow scan.

Parse + access all fields (cold cache — fresh cert each iteration)

Certificate setsynta (PyO3, 19 fields)cryptography.x509 (9 fields)
Traditional (~900 B, 5 certs)3.30–3.42 µs15.90–16.29 µs
ML-DSA-44 (3,992 B)3.71 µs13.41 µs
ML-DSA-65 (5,521 B)3.63 µs13.14 µs
ML-DSA-87 (7,479 B)3.71 µs13.38 µs

synta is ~4.8× faster than cryptography.x509 for traditional certs (parse+fields, 19 vs 9 fields). ML-DSA certs measure ~3.6× faster — the LAMPS WG test certs are simpler (fewer extensions) than the NIST PKITS certs, so cryptography.x509 processes fewer extension fields, reducing its parse+fields time. Requires cryptography ≥ 44.0 (FIPS 204) for ML-DSA public_key() decoding; older versions raise UnsupportedAlgorithm.

Field access only (warm cache — same cert, repeated access)

Certificate setsynta (PyO3, 19 fields)cryptography.x509 stockcryptography.x509 perf-opt
Traditional (~900 B, 5 certs)0.41–0.43 µs~14.8 µs0.91–0.95 µs
ML-DSA-44 (3,992 B)0.41 µs~11.2 µs2.21–2.28 µs
ML-DSA-65 (5,521 B)0.41 µs~11.2 µs2.21–2.28 µs
ML-DSA-87 (7,479 B)0.42 µs~11.1 µs2.21–2.28 µs

synta warm-cache access (~0.42 µs) is 19 clone_ref calls — essentially free per field.

Stock cryptography.x509 memoises only extensions (via PyOnceLock). The other 8 fields (subject, issuer, serial_number, not_valid_before_utc, not_valid_after_utc, signature, signature_hash_algorithm, public_key) are re-derived from the zero-copy in-memory ASN.1 structure on every Python access, each allocating a new Python object. Warm and cold times are therefore close (~11–15 µs).

cryptography PR #14441 adds OnceLock caching for issuer, subject, public_key, and signature_algorithm on Certificate, and caching on CertificationRequest, CertificateList, and OCSPResponse. This brings traditional warm access from ~14.8 µs down to 0.91–0.95 µs — a ~16× improvement. synta remains ~2.2× faster for traditional warm access and ~5× faster for ML-DSA (larger key objects have higher PyBytes copy cost even on the cached path). Parse-only and cold parse+fields times are unaffected by the caching changes.

PKCS#7 and PKCS#12 extraction

Measured with python/bench_pkcs.py. Benchmark IDs match the Rust Criterion IDs in synta-bench/benches/pkcs_formats.rs exactly (pkcs7/{synta,cryptography}/value, pkcs12/{synta,cryptography}/value).

Benchmark IDInputsyntacryptographySpeedup
pkcs7/…/amazon_roots1,848 B DER/BER, 2 certs0.82 µs (Rust) / 1.27 µs (Py)41.7 µs~33×
pkcs7/…/pem_isrg1,992 B PEM, 1 cert3.58 µs (Rust) / 4.14 µs (Py)32.1 µs~8×
pkcs12/…/unencrypted_3certs3,539 B, 3 certs1.10 µs (Rust) / 1.80 µs (Py)134 µs~75×
pkcs12/…/unencrypted_1cert_with_key756 B, 1 cert + key639 ns (Rust) / 0.96 µs (Py)

The unencrypted_1cert_with_key vector (cert-none-key-none.p12) uses a non-standard format that cryptography cannot parse; it is benchmarked as synta-only to exercise the key-bag-skipping code path.

The ~0.5–0.7 µs gap between Rust and Python times is the usual PyO3 overhead: GIL acquisition, PyBytes allocation for each returned certificate, and PyList construction. The pem_isrg Python time (4.14 µs) is higher than pure PKCS#7 parse (1.27 µs) because PEM decoding and base64 allocation are included in the timed loop.

cryptography comparison note: The amazon_roots DER benchmark triggers a UserWarning from cryptography — the file uses BER indefinite-length encoding (0x30 0x80…), which cryptography handles via an internal fallback with a deprecation warning. synta accepts BER transparently with no warning.

Architecture notes

Parse-only: synta.Certificate.from_der() holds a Py<PyBytes> strong reference to the caller’s bytes object (no copy). It performs only a 4-operation shallow envelope scan (outer SEQUENCE tag+length, TBSCertificate SEQUENCE tag+length) — equivalent to synta_certificate::validate_envelope() in Rust — and records the TBS byte range. The full recursive Certificate::decode() is deferred to the first getter call. synta.Certificate.full_from_der() performs the same shallow scan and then immediately triggers the full decode, so all 19 field caches are warm before any Python code accesses them.

Parse+fields: All 19 getters cache their Python object in an OnceLock<Py<T>>. String-returning getters (issuer, subject, signature_algorithm, signature_algorithm_oid, public_key_algorithm, public_key_algorithm_oid, not_before, not_after) store Py<PyString>. Bytes-returning getters (signature_value, public_key, tbs_bytes, issuer_raw_der, subject_raw_der) store Py<PyBytes>. Optional getters (signature_algorithm_params, public_key_algorithm_params, extensions_der) store Option<Py<PyBytes>>. to_der skips the lock entirely — direct clone_ref of the stored Py<PyBytes> that was passed to from_der.

#[pyclass(frozen)]: PyCertificate is declared frozen, which removes the 8-byte PyO3 borrow-tracking field (borrow_flag: Cell<isize>) from every instance and eliminates per-getter Cell::get/set calls. This gives a ~26% speedup on the warm-path field-access benchmark.

ML-DSA parse+fields: The first-call PyBytes copies of signature_value (2,420–4,627 bytes) and public_key (1,312–1,952 bytes) add ~0.3 µs over traditional certs (3.63–3.71 µs vs 3.30–3.42 µs). Subsequent accesses use clone_ref at ~0.41 µs total regardless of cert size.

For the full binding-layer comparison (Rust rust_shallow / rust_typed / rust_element / c_ffi / Python synta / synta_full / cryptography_x509, parse-only and parse+fields) see docs/performance.md.

See also Development for how to run benchmarks.