PSI

PSI lets two parties learn which elements they have in common without revealing the rest. Conceptually:

\[ \text{PSI}(A, B) = A \cap B \]

In practice, the AP3 PSI variant returns a result scoped to the initiator's question — typically "is element X in B?" (a boolean) or "how many of my elements are in B?" (a count) — without ever exposing the contents of \(B\) to the initiator or the contents of \(A\) to the receiver.

Interactive explanation

Why does the B stamp survive when A is peeled off?

The wrapping/stamping picture works beautifully for the journey but cracks at this exact moment. The fix is a small upgrade to the metaphor.

Imagine Provider's stamp B isn't a postage sticker laid on top of the wrapper. It's a steel die pressed firmly against the wrapped lump — pressed hard enough that it punches through wrapper A and leaves a permanent impression on the ball underneath. The wrapper is just a soft envelope. The stamp's mark lives on the ball itself.

The stamp pushes through the wrapper, therefore it leaves a push mark. Peel the wrapper off and the push mark is still there.

So when Consumer peels A away, the wrapper comes off clean — but the indentation B punched into the ball stays right where it was. And critically: it's the same indentation Provider would have left if it had stamped the bare ball directly. Which is exactly what Provider did to every ball in its own list. Same procedure on both sides → directly comparable outputs. That equality check at the end is only possible because of this.

Underneath the metaphor

The ball is a point \(P\) on an elliptic curve. \(A\) is Consumer's secret scalar \(a\); \(B\) is Provider's secret scalar \(b\). The protocol computes:

\[ \begin{aligned} \text{Consumer:} \quad & a \cdot P && \text{(blind)} \\ \text{Provider:} \quad & b \cdot (a \cdot P) = a \cdot b \cdot P && \text{(evaluate — scalar mul commutes)} \\ \text{Consumer:} \quad & a^{-1} \cdot (a \cdot b \cdot P) = b \cdot P && \text{(unblind)} \end{aligned} \]

The \(a\) and \(a^{-1}\) annihilate. \(b\) was never tangled with \(a\), so it survives untouched. What's left is \(b \cdot P\) — identical to what Provider produces on every ball in its own list (each is \(b \cdot P_i\)). Equality comparison works because both sides went through the same \(b\)-multiplication, on the same hash-to-curve output, and nothing else.

Without commutativity, peeling A would smear B in some recoverable way (\(a^{-1} \cdot b \cdot a \cdot P\) wouldn't simplify to anything useful) and the comparison would silently fail. Every OPRF in the wild — 2HashDH, the constructions in RFC 9497 — is built on a commutative group operation for exactly this reason.

How it runs (4 envelopes, end to end)

PSI exchanges four envelopes per session. The two messages from the initiator (OB) each carry their own signed PrivacyIntentDirective bound to that envelope's payload — every initiator→receiver message is independently authenticated.

#	Direction	Phase	Payload	What happens
1	OB → BB	`init`	`commit(sid_0, blind)`	Session kick-off. OB picks `sid_0` and a random blind value, then sends a hiding commitment to them. The signed intent rides here and authenticates OB.
2	BB → OB	`msg0`	`sid_1`	BB picks its half of the session ID (`sid_1`) and sends it in the clear. BB has no way to see `sid_0` yet, so it can't grind.
3	OB → BB	`msg1`	`sid_0 ‖ blind ‖ psc1`	OB opens the commit (revealing `sid_0` + `blind`), derives `session_id = H(sid_0, sid_1)`, blinds its query, and sends `psc_msg1`. BB verifies the commit opens correctly. A fresh signed intent on this envelope binds the actual payload.
4	BB → OB	`msg2`	`psc2`	BB runs its half of the cryptographic protocol against its private set, returns the response. BB learns nothing about OB's query.
—	(local)	—	result	OB processes `msg2` locally and learns the answer. BB never learns the answer.

The contributory session_id (H(sid_0, sid_1) with a commit-then-reveal exchange) means neither party alone chooses the session_id: OB is locked into sid_0 by the commit before seeing sid_1, and BB has to pick sid_1 before seeing sid_0. This prevents either party from grinding for a session_id that produces a favorable transcript.

The full cryptographic details are encapsulated by the operation implementation; from the SDK side, you call PSIOperation.start(...) / .receive(...) / .process(...) and the framework drives the four envelopes for you (or use PrivacyAgent.run_intent(...) to drive the whole exchange end-to-end over A2A).

Use cases

Customer overlap analysis between companies that don't want to swap customer lists.
Fraudulent account detection across institutions, surfacing only the matches.
Supply chain partner verification — confirming a counterparty exists in an approved-supplier list without exposing the list.
Sanctions / blacklist screening without disclosing either side.

What PSI does not do

It does not reveal the receiver's full set to the initiator.
It does not reveal the initiator's query to the receiver.
It does not, by itself, prove that the receiver actually used the dataset it advertised in its commitment. That gap is what proof-of-computation work is intended to close — see the Distribution section and the Roadmap.

For a hands-on tutorial, see the psi_simple and psi_adk_simple examples in the main repository.

PSI - Performance Benchmarks

Environment: Python 3.11.9, arm64-darwin, pure-Python Ristretto255 via rbcl + merlin_transcripts.
Methodology: time.perf_counter wall-clock, 5-iteration warmup. Each row shows mean with p95 and min/max.

1. Functional layer

Direct calls into the pure-Python cryptographic primitives (Ristretto255 via rbcl, Merlin transcripts). This is the minimum cost for each operation.

Operation	n	Mean	p95	Min	Max
`generate_hash()` — customer	1000	0.30 µs	0.33 µs	0.25 µs	1.25 µs
`generate_hash()` — sanction	1000	0.30 µs	0.38 µs	0.25 µs	0.58 µs
`compute_session_id()`	1000	0.31 µs	0.33 µs	0.25 µs	11.21 µs
`create_commitment()`	1000	0.30 µs	0.33 µs	0.25 µs	7.42 µs
`verify_commitment()`	1000	0.34 µs	0.38 µs	0.29 µs	0.92 µs
`create_psc_msg1()`	500	655.40 µs	669.08 µs	628.71 µs	711.29 µs
`process_psc_msg1()` — 1 entries	100	2.62 ms	2.65 ms	2.49 ms	3.59 ms
`process_psc_msg1()` — 10 entries	100	16.86 ms	16.96 ms	16.56 ms	18.00 ms
`process_psc_msg1()` — 50 entries	100	80.23 ms	81.80 ms	79.82 ms	83.24 ms
`process_psc_msg1()` — 100 entries	100	159.49 ms	161.88 ms	158.62 ms	162.60 ms
`process_psc_msg1()` — 500 entries	50	794.31 ms	796.96 ms	790.05 ms	819.49 ms
`process_psc_msg1()` — 1,000 entries	20	1583.00 ms	1597.86 ms	1579.11 ms	1597.86 ms
`process_psc_msg2()`	1000	1.98 ms	2.01 ms	1.95 ms	3.15 ms

Notes:

generate_hash, compute_session_id, create_commitment, and verify_commitment are SHA-256 only — no curve operations.
create_psc_msg1 performs one h1_function (hash-to-curve via Merlin + from_hash) + one scalar multiply.
process_psc_msg1 is O(n) in sanction list size: per entry: one h1_function + one scalar multiply + one h2_function, plus one Schnorr DLog proof.
process_psc_msg2 is fixed cost: one DLog proof verification, one scalar invert, one h2_function, one O(n) hmac.compare_digest scan.

2. PSI Operation Layer

PSIOperation wraps the cryptographic core in the ap3.Operation contract (4-round contributory session ID, base64 serialisation, role/phase validation).

Operation	n	Mean	p95	Min	Max	FFI baseline	Overhead
`on_start()` — OB init	200	2.56 µs	2.71 µs	2.46 µs	4.50 µs	1.93 µs	~0.64 µs
`_receiver_step(init)` — BB, 50 entries	100	19.12 µs	19.33 µs	18.79 µs	21.96 µs	12.38 µs	~6.74 µs
`_initiator_step(msg0)` — OB creates msg1	200	660.09 µs	671.71 µs	653.04 µs	697.42 µs	657.67 µs	~2.42 µs
`_receiver_step(msg1)` — BB, 50 entries	100	80.34 ms	82.40 ms	79.79 ms	83.53 ms	80.57 ms	~0.00 µs
`_initiator_step(msg2)` — OB finalise	200	1.99 ms	2.01 ms	1.96 ms	2.96 ms	1.98 ms	~10.03 µs

Notes:

Operation-layer overhead (base64 encode/decode, phase dispatch, state dict ops) is small relative to the underlying crypto.
_receiver_step(init) hashes the entire sanction list upfront and caches results in session state, so _receiver_step(msg1) skips re-hashing.
The 4-round commitment exchange adds only cheap SHA-256 operations (create_commitment, verify_commitment, compute_session_id).

3. Full PSI Round-Trip

End-to-end protocol execution across all 4 rounds: on_start → _receiver_step(init) → _initiator_step(msg0) → _receiver_step(msg1) → _initiator_step(msg2). Single-threaded sequential execution; no network overhead.

Sanction list size	n	Mean	p95	Min	Max	Throughput
1	100	5.29 ms	5.33 ms	5.26 ms	5.37 ms	~189 checks/s
10	100	19.62 ms	19.80 ms	19.51 ms	21.06 ms	~51 checks/s
50	100	82.90 ms	84.56 ms	82.29 ms	85.86 ms	~12 checks/s
100	50	161.59 ms	164.06 ms	160.76 ms	164.57 ms	~6 checks/s
500	10	792.71 ms	795.34 ms	790.79 ms	795.34 ms	~1 checks/s
1,000	10	1581.07 ms	1583.44 ms	1578.23 ms	1583.44 ms	~1 checks/s

Scaling characteristic: Round-trip time grows linearly with sanction list size. The dominant cost is process_psc_msg1 (receiver side), O(n): one h1_function (hash-to-curve via Merlin) + one scalar multiply + one h2_function per entry.

4. Session Management

Pure-Python overhead for Operation session bookkeeping (no cryptography).

Operation	n	Mean	p95	Min	Max
`has_session()` — hit	10000	0.07 µs	0.08 µs	0.00 µs	0.50 µs
`has_session()` — miss	10000	0.06 µs	0.08 µs	0.04 µs	0.29 µs
`_save_session()`	10000	0.74 µs	0.33 µs	0.21 µs	3.78 ms

Session operations are sub-microsecond for lookups; they contribute < 1% of total round-trip cost.

Summary

Layer	Dominant cost	Notes
SHA-256 helpers	single `hashlib.sha256` call	sub-microsecond
`create_psc_msg1`	1× hash-to-curve + 1× scalar mul	~650 µs
`process_psc_msg1`	O(n): hash-to-curve + scalar mul + H2 per entry	~2.6 ms @ n=1, ~1.6 s @ n=1000
`process_psc_msg2`	DLog verify + scalar invert + H2 + O(n) compare	~2 ms fixed
Operation overhead	base64, phase dispatch, state dict ops	< 100 µs per step
Session management	Python dict lookup / insert	< 5 µs