Performance

Measured with python/bench_certificate.py. Run after maturin develop --release or maturin build --release --interpreter python3 and installing the wheel. The benchmark uses Criterion’s Linear sampling mode (100 samples, linearly increasing iteration counts; see python/criterion_compat.py), so timings are directly comparable to the Rust Criterion benchmarks in synta-bench.

Note: Always verify the installed .so is a release build after maturin develop --release — the file size should be ≈1.4 MB (release) not ≈20 MB (debug). If in doubt, copy target/release/lib_synta.so manually to python/synta/_synta.abi3.so.

Parse-only (call only, no field access)

Two Python parse-only modes are benchmarked, corresponding to the two Rust variants:

Certificate set	`synta` lazy (`from_der`)	`synta_full` eager (`full_from_der`)	`cryptography.x509`	Rust `rust_shallow`	Rust `rust_typed`
Traditional (~900 B, 5 certs)	0.16 µs	0.72–0.76 µs	1.62–1.68 µs	0.019–0.020 µs	0.50–0.51 µs
ML-DSA-44 (3,992 B)	0.17 µs	0.74 µs	1.55 µs	0.020 µs	0.54 µs
ML-DSA-65 (5,521 B)	0.17 µs	0.73 µs	1.55 µs	0.019 µs	0.49 µs
ML-DSA-87 (7,479 B)	0.17 µs	0.73 µs	1.54 µs	0.020 µs	0.49 µs

Comparison note: synta (from_der) and Rust rust_shallow perform the same 4-operation envelope scan. The ~0.14 µs gap between them (0.16 µs vs 0.019 µs) is pure PyO3 overhead: GIL acquisition, Py<PyBytes> setup, and constructing a PyCertificate struct with 17 OnceLock fields. synta_full (full_from_der) and Rust rust_typed both do a complete RFC 5280 decode; the ~0.23 µs gap is the same PyO3 overhead amortised over the full parse.

For parse-only workloads the honest comparison with cryptography.x509 (also lazy) is synta vs cryptography.x509 — ~10× faster for traditional certs, ~9× faster for ML-DSA certs. For full-decode workloads the apples-to-apples comparison is synta_full vs cryptography.x509 — ~2.3× faster.

from_der holds a Py<PyBytes> reference to the caller’s bytes object — no copy of the DER data. Parse time is flat across ML-DSA cert sizes: the large signature BIT STRING is not decoded at all during the shallow scan.

Parse + access all fields (cold cache — fresh cert each iteration)

Certificate set	`synta` (PyO3, 19 fields)	`cryptography.x509` (9 fields)
Traditional (~900 B, 5 certs)	3.30–3.42 µs	15.90–16.29 µs
ML-DSA-44 (3,992 B)	3.71 µs	13.41 µs
ML-DSA-65 (5,521 B)	3.63 µs	13.14 µs
ML-DSA-87 (7,479 B)	3.71 µs	13.38 µs

synta is ~4.8× faster than cryptography.x509 for traditional certs (parse+fields, 19 vs 9 fields). ML-DSA certs measure ~3.6× faster — the LAMPS WG test certs are simpler (fewer extensions) than the NIST PKITS certs, so cryptography.x509 processes fewer extension fields, reducing its parse+fields time. Requires cryptography ≥ 44.0 (FIPS 204) for ML-DSA public_key() decoding; older versions raise UnsupportedAlgorithm.

Field access only (warm cache — same cert, repeated access)

Certificate set	`synta` (PyO3, 19 fields)	`cryptography.x509` stock	`cryptography.x509` perf-opt
Traditional (~900 B, 5 certs)	0.41–0.43 µs	~14.8 µs	0.91–0.95 µs
ML-DSA-44 (3,992 B)	0.41 µs	~11.2 µs	2.21–2.28 µs
ML-DSA-65 (5,521 B)	0.41 µs	~11.2 µs	2.21–2.28 µs
ML-DSA-87 (7,479 B)	0.42 µs	~11.1 µs	2.21–2.28 µs

synta warm-cache access (~0.42 µs) is 19 clone_ref calls — essentially free per field.

Stock cryptography.x509 memoises only extensions (via PyOnceLock). The other 8 fields (subject, issuer, serial_number, not_valid_before_utc, not_valid_after_utc, signature, signature_hash_algorithm, public_key) are re-derived from the zero-copy in-memory ASN.1 structure on every Python access, each allocating a new Python object. Warm and cold times are therefore close (~11–15 µs).

cryptography PR #14441 adds OnceLock caching for issuer, subject, public_key, and signature_algorithm on Certificate, and caching on CertificationRequest, CertificateList, and OCSPResponse. This brings traditional warm access from ~14.8 µs down to 0.91–0.95 µs — a ~16× improvement. synta remains ~2.2× faster for traditional warm access and ~5× faster for ML-DSA (larger key objects have higher PyBytes copy cost even on the cached path). Parse-only and cold parse+fields times are unaffected by the caching changes.

PKCS#7 and PKCS#12 extraction

Measured with python/bench_pkcs.py. Benchmark IDs match the Rust Criterion IDs in synta-bench/benches/pkcs_formats.rs exactly (pkcs7/{synta,cryptography}/value, pkcs12/{synta,cryptography}/value).

Benchmark ID	Input	`synta`	`cryptography`	Speedup
`pkcs7/…/amazon_roots`	1,848 B DER/BER, 2 certs	0.82 µs (Rust) / 1.27 µs (Py)	41.7 µs	~33×
`pkcs7/…/pem_isrg`	1,992 B PEM, 1 cert	3.58 µs (Rust) / 4.14 µs (Py)	32.1 µs	~8×
`pkcs12/…/unencrypted_3certs`	3,539 B, 3 certs	1.10 µs (Rust) / 1.80 µs (Py)	134 µs	~75×
`pkcs12/…/unencrypted_1cert_with_key`	756 B, 1 cert + key	639 ns (Rust) / 0.96 µs (Py)	—	—

The unencrypted_1cert_with_key vector (cert-none-key-none.p12) uses a non-standard format that cryptography cannot parse; it is benchmarked as synta-only to exercise the key-bag-skipping code path.

The ~0.5–0.7 µs gap between Rust and Python times is the usual PyO3 overhead: GIL acquisition, PyBytes allocation for each returned certificate, and PyList construction. The pem_isrg Python time (4.14 µs) is higher than pure PKCS#7 parse (1.27 µs) because PEM decoding and base64 allocation are included in the timed loop.

cryptography comparison note: The amazon_roots DER benchmark triggers a UserWarning from cryptography — the file uses BER indefinite-length encoding (0x30 0x80…), which cryptography handles via an internal fallback with a deprecation warning. synta accepts BER transparently with no warning.

Architecture notes

Parse-only: synta.Certificate.from_der() holds a Py<PyBytes> strong reference to the caller’s bytes object (no copy). It performs only a 4-operation shallow envelope scan (outer SEQUENCE tag+length, TBSCertificate SEQUENCE tag+length) — equivalent to synta_certificate::validate_envelope() in Rust — and records the TBS byte range. The full recursive Certificate::decode() is deferred to the first getter call. synta.Certificate.full_from_der() performs the same shallow scan and then immediately triggers the full decode, so all 19 field caches are warm before any Python code accesses them.

Parse+fields: All 19 getters cache their Python object in an OnceLock<Py<T>>. String-returning getters (issuer, subject, signature_algorithm, signature_algorithm_oid, public_key_algorithm, public_key_algorithm_oid, not_before, not_after) store Py<PyString>. Bytes-returning getters (signature_value, public_key, tbs_bytes, issuer_raw_der, subject_raw_der) store Py<PyBytes>. Optional getters (signature_algorithm_params, public_key_algorithm_params, extensions_der) store Option<Py<PyBytes>>. to_der skips the lock entirely — direct clone_ref of the stored Py<PyBytes> that was passed to from_der.

#[pyclass(frozen)]: PyCertificate is declared frozen, which removes the 8-byte PyO3 borrow-tracking field (borrow_flag: Cell<isize>) from every instance and eliminates per-getter Cell::get/set calls. This gives a ~26% speedup on the warm-path field-access benchmark.

ML-DSA parse+fields: The first-call PyBytes copies of signature_value (2,420–4,627 bytes) and public_key (1,312–1,952 bytes) add ~0.3 µs over traditional certs (3.63–3.71 µs vs 3.30–3.42 µs). Subsequent accesses use clone_ref at ~0.41 µs total regardless of cert size.

For the full binding-layer comparison (Rust rust_shallow / rust_typed / rust_element / c_ffi / Python synta / synta_full / cryptography_x509, parse-only and parse+fields) see docs/performance.md.

See also Development for how to run benchmarks.