Simulating a Pauli tensor faster than as a generalised permutation matrix #154
Replies: 1 comment 1 reply
-
Hi @TysonRayJones, Thank you so much for your feedback. I'd like to add to the computation cost of |
Beta Was this translation helpful? Give feedback.
-
Hi there,
Imagine that we seek to apply a tensor of Pauli operators (rather than just calculate their expectation values). That is, they appear as same-column gates in our circuit like might appear due to twirling, or Monte Carlo decomposition of Pauli channels, etc.
We could inefficiently effect these in-turn as one-qubit gates. It turns out that's a factor
#paulis
slower than necessary, and is especially a shame when the operators have common controls:It is ideal to simulate them as a single operator:
Tensors of Pauli operators have a special form owed to each being diagonal or anti-diagonal, and can be simulated as quickly as a single Pauli operator (with no necessary matrix allocations) as outlined in Sec IV E here. It is my understanding that it is not currently possible to leverage this in cuStateVec.
Their Z-basis matrices are simple, specific instances of generalised permutation matrices:
They can ergo be simulated using cuStateVec's
custatevecApplyGeneralizedPermutationMatrix
, however a few issues remain:custatevecApplyPauliRotation
, if it is using an algorithm similar to Sec IV G.4^#paulis
)! Essentially, the Pauli tensor simply swaps pairs of amplitudes and never moves them "in a loop" like a general permutation matrix might, so the final destination index of the amplitudes is simply and cheaply encoded using a fixed number of bitmasks, rather than an exponentially big matrix.Ergo a new, dedicated function for simulating Pauli tensors in cuStateVec would be very valuable and much faster than the existing method (linearly faster than in-turn simulation, and exponentially faster than via permutation matrices).
And notably, it will be easy; the logic for simulating the Pauli tensor is identical to optimal simulation of the Pauli gadget (
custatevecApplyPauliRotation
); they differ only by constant coefficients multiplied onto each amplitude (courtesy of Euler's formula).You can view a non-distributed implementation of the algorithm here with arguments prepared here, which is derived in Sec IV E here. It is trivial to introduce control qubits in arbitrary states, which merely eliminate iterations of the main loop (each of which modifies one amplitude).
In my application (which I'll soon have the public code to link), I will have to fall-back to using a custom kernel implementing the above algorithm - it would be fantastic to update it to use a cuStateVec function in the future!
Beta Was this translation helpful? Give feedback.
All reactions