Skip to content

Fix Quantum Shannon Decomposition #6756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 11, 2024

Conversation

NoureldinYosri
Copy link
Collaborator

@NoureldinYosri NoureldinYosri commented Oct 9, 2024

This PR:

  • fixes how _single_qubit_decomposition handles global phase and reduces the number of operations it returns when possible.
  • Makes the eigendecomposition part resilient to numerical precision issues by:
    1. resorting to the more accurateeigh instead of eig when the unitary is hermitian.
    2. if the unitary is not hermitian then use Gram–Schmidt to make sure the eigenvectors are orthonormal (mathematically they should but we could get non orthognoal vectors either because of numerical precision or because an eigenvector is repeated and as a result the the eigenvector spanning that subspace are arbitary)
  • Reduce the number of operations by removing operations equaivalent to identity (e.g. rz($\phi$) with $\phi \approx 0$)
  • Adds tests for the reported failures making sure they are fixed.
  • Adds tests for the corner cases of the decomposition.

update: also add two_qubit_matrix_to_cz_operations and three_qubit_matrix_to_operations as base cases to reduce the depth of the circuit.


fixes #6666
fixes #6725


@ikd-sci let me know if this works for you.


@ACE07-Sev sorry I had to dive into this task because we had an urgent need for the decomposition. Does this PR fix the issues you found? also Does this fix the depth issue you found #6666 (comment)?

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.83%. Comparing base (12dcd01) to head (5c7d12b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6756   +/-   ##
=======================================
  Coverage   97.83%   97.83%           
=======================================
  Files        1077     1077           
  Lines       92577    92672   +95     
=======================================
+ Hits        90570    90668   +98     
+ Misses       2007     2004    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ACE07-Sev
Copy link
Contributor

Before merging, it'd be great if we can add two qubit decomposition as well. It would definitely improve the circuit depth.

@ACE07-Sev
Copy link
Contributor

@NoureldinYosri Testing for the items you mentioned now.

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 9, 2024

@NoureldinYosri So, I re-did all the tests that were faulty last time, and it seems to be ok now. As for depth, here is a side-by-side comparison:

[3, 21, 99, 435, 1827, 7491, 30339, 122115]
[1, 7, 45, 225, 1009, 4273, 17585, 71345]

image
I think the issue is the missing two-qubit decomposition. Additionally, I'd strongly recommend using more conventional gates compared to Pauli power ones, as it would make the overall synthesis ideally shallower. I would say using RY, RZ, CX, and global phase should suffice.

If you can kindly update for these two, I can test again and see if the depth issue remains.

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev I added the 2q and 3q analytical decompositions as base cases. does this make the depth better?

@pavoljuhas
Copy link
Collaborator

It might be worthwhile to investigate why does this fail the is_unitary check on mac.
I ran the quantum_shannon_decomposition_test.py on Linux with the patch below,
and most of the eye_diff_max values were between 3e-18 to 6e-14, well below the atol
tolerance in is_unitary which is 1e-8. Do you get a similar range on Mac or are the
differences systematically much larger?

diff --git a/cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition.py b/cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition.py
index 3c747092..4350006b 100644
--- a/cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition.py
+++ b/cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition.py
@@ -80,6 +80,8 @@ def quantum_shannon_decomposition(
         ValueError: If the u matrix is non-unitary
         ValueError: If the u matrix is not of shape (2^n,2^n)
     """
+    eye_diff_max = np.abs(u.dot(u.T.conj()) - np.eye(len(u))).max()
+    print(f"\nDB: {eye_diff_max=}")
     if not predicates.is_unitary(u, atol=atol):  # Check that u is unitary
         raise ValueError(
             "Expected input matrix u to be unitary, \

def quantum_shannon_decomposition(qubits: 'List[cirq.Qid]', u: np.ndarray) -> 'op_tree.OpTree':
"""Decomposes n-qubit unitary into CX/YPow/ZPow/CNOT gates, preserving global phase.
def quantum_shannon_decomposition(
qubits: 'List[cirq.Qid]', u: np.ndarray, atol: float = 1e-8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should first find out why is there out-of-tolerance problem on Mac, and based on that it may or may not be necessary to add the atol argument.

Copy link
Collaborator Author

@NoureldinYosri NoureldinYosri Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is need ... the atol value needed to be passed down to the analytical decomposition methods

Copy link
Collaborator

@pavoljuhas pavoljuhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments.

@ACE07-Sev
Copy link
Contributor

@ACE07-Sev I added the 2q and 3q analytical decompositions as base cases. does this make the depth better?

@NoureldinYosri Checking now sir.

@ACE07-Sev
Copy link
Contributor

Much better.

[3, 8, 45, 219, 963, 4035, 16515, 66819]
[1, 7, 45, 225, 1009, 4273, 17585, 71345]

image

@ACE07-Sev
Copy link
Contributor

I also did a test with transpile. Here are the results:

cirq = [3, 8, 45, 219, 963, 4035, 16515, 66819, 268803]
qiskit = [1, 7, 45, 225, 1009, 4273, 17585, 71345, 287409]
qiskit with transpile = [1, 7, 41, 197, 869, 3653, 14981, 60677, 244229]

image
I am not sure what type of optimization the transpile does here that reduces the depth that much.

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev thank you. could you pass the output of shannon decomposition to optimize_for_target_gateset and see what happens? optimize_for_target_gateset is our circuit optimization method. for example cirq.optimize_for_target_gateset(circuit, gateset=cirq.CZTargetGateset())

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 10, 2024

@NoureldinYosri Of course, I'll test it out. A bit of a lazy question hehe, but does cirq have a gateset for just U3 and CX for a reliable comparison?

@NoureldinYosri
Copy link
Collaborator Author

NoureldinYosri commented Oct 10, 2024

does cirq have a gateset for just U3 and CX for a reliable comparison?

U3 is analgous to cirq.PhasedXZ gate and CX is equivalent to CZ up to hadamards. so the analgous gateset is the CZ gateset

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 10, 2024

It became worse. I'll use my converter and make it into a qiskit circuit and compare like that for safety.

cirq = [1, 7, 49, 241, 1057, 4417, 18049, 72961, 293377]
qiskit = [1, 7, 45, 225, 1009, 4273, 17585, 71345, 287409]
qiskit with transpile = [1, 7, 41, 197, 869, 3653, 14981, 60677, 244229]

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev that's probably because CZTargetGateset by default preserves moment structure which will increase the depth. Could you run the optimizer with the following args

cirq.optimize_for_target_gateset(circuit, gateset=cirq.CZTargetGateset(preserve_moment_structure=False), max_num_passes=None)

@ACE07-Sev
Copy link
Contributor

Of course, running the test now.

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 10, 2024

@NoureldinYosri Improved, but very slightly.

[1, 7, 49, 235, 1027, 4291, 17539, 70915, 285185]

Something I'll say is that maybe the two qubit decomposition is not the same as Qiskit's? When I remove the three qubit decomposition, the depth becomes really large again, whereas with qiskit they only use one and two qubit decomposition. It would be definitely worthwhile to try and implement it like that as the three qubit decomposition and these additional optimizations are really slowing the code down (it took 20 minutes to run as opposed to 1:45-ish) and still are not matching that depth scaling which makes or breaks larger scale algorithms.

I understand it's a bit tedious (I have been trying to translate Qiskit's approach for a while now, the rust porting makes it a bit tricky with all the dependencies), but definitely worth the one-time hassle for the future. I will keep trying to make some proper code additions and make a PR or update here.

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev thanks for the invesitgation and for running the tests. I really appreciate it. As for the decompositions or compilation we usually implement the best algorithms from the literature. It's just that it has been sometime since we implemented newer algorithms. If you are aware of new decompositions or compilation algorithms that are better than what we have then please open an issue with links to those papers.

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 10, 2024

I see. Of course, I'll re-read the Qiskit implementation and see if it's based on any new literature (the only new PR they had was for multiplexor gates, but still not merged so it's not from that), and will inform ASAP.

In the meantime, for your kind reference I'd recommend checking qiskit's approach:
https://github.com/Qiskit/qiskit/blob/main/qiskit/synthesis/unitary/qsd.py

I'll have a look to see what optimization transpile does as well. I'd love to see cirq reach that scaling.

Copy link
Collaborator

@pavoljuhas pavoljuhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 11, 2024

I think I have an idea about why the depths are not matching with qiskit's. I ran your QSD with target gateset of U3 and CX, and compared the number of U3 and CX gates. So, yours is:

OrderedDict([('u', 1)])
OrderedDict([('u', 14), ('cx', 3)])
OrderedDict([('u', 97), ('cx', 24)])
OrderedDict([('u', 475), ('cx', 120)])

For 1 to 4 qubits respectively. Qiskit's QSD has two parameters opt_a1 and opt_a2. If I don't use them, qiskit's result is:

OrderedDict([('u', 1)])
OrderedDict([('u', 8), ('cx', 3)])
OrderedDict([('u', 44), ('cx', 24)])
OrderedDict([('u', 200), ('cx', 120)])

We can see the number of CX match (as expected, given both are based on the Shende et al paper). The number of u doesn't match which I imagine is because of your use of pow gates, and is a rather easy fix. So, I think if you add opt_a1 and opt_a2 you'll match the same depth scaling as qiskit's. As for the reference, it's A.1 and A.2 from Shende's paper.
https://arxiv.org/pdf/quant-ph/0406176
Additionally, I think if you also keep the three qubit decomposition, you'd beat qiskit's by a slight percentage, but less is less nonetheless hehe.

@ACE07-Sev
Copy link
Contributor

ACE07-Sev commented Oct 11, 2024

I also had a quick review of benchmarks on general unitary preparation from this 2023 paper and have written the following code for calculating the number of CX gates for your kind reference.

def calculate_cost(num_qubits: int) -> None:
    original_decomposition = num_qubits ** 3 * 4 ** num_qubits
    asymptotic_decomposition = num_qubits * 4 ** num_qubits
    gray_codes = 4 ** num_qubits
    cosine_sine_decomposition = 4 ** num_qubits - 2 ** (num_qubits + 1)
    optimized_qsd = 23/48 * 4 ** num_qubits - 3/2 * 2 ** num_qubits + 4/3
    kg_cartan_decomposition = 21/16 * 4 ** num_qubits - 3 * (num_qubits * 2 ** (num_qubits - 2) + 2 ** num_qubits)
    theoretical_lower_bound = np.ceil(1/4 * (4 ** num_qubits - 3 * num_qubits - 1))

    print(f"Original decomposition: {original_decomposition}")
    print(f"Asymptotic decomposition: {asymptotic_decomposition}")
    print(f"Gray codes: {gray_codes}")
    print(f"Cosine-sine decomposition: {cosine_sine_decomposition}")
    print(f"Optimized QSD: {optimized_qsd}")
    print(f"KG-Cartan decomposition: {kg_cartan_decomposition}")
    print(f"Theoretical lower bound: {theoretical_lower_bound}")

The approach by Shende et al. with A.1 and A.2 is still SOTA for exact encoding. I will research further to be sure.

P.S : Here's a quick plot for reference.
image
For clarity, the gray code is overlapping with cosine-sine approach, hence why it's hard to see.

@NoureldinYosri NoureldinYosri enabled auto-merge (squash) October 11, 2024 14:43
@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev Thanks for the analysis and the effort


The number of u doesn't match which I imagine is because of your use of pow gates

but this is should not be the issue after running the optimize_for_target_gateset method since it will replace all of these with PhasedXZGates ... what does the circuit look like after the optimizer?

The approach by Shende et al. with A.1 and A.2 is still SOTA for exact encoding.

Good to know

@ACE07-Sev
Copy link
Contributor

@NoureldinYosri

but this is should not be the issue after running the optimize_for_target_gateset method since it will replace all of these with PhasedXZGates ... what does the circuit look like after the optimizer?

I did a quick check for two qubits sir.
image
By the way, I noticed the unitary is changing after the target gateset optimization. It shouldn't be doing that.

@NoureldinYosri
Copy link
Collaborator Author

By the way, I noticed the unitary is changing after the target gateset optimization

#6517

@NoureldinYosri NoureldinYosri merged commit 5718365 into quantumlib:main Oct 11, 2024
35 checks passed
@pavoljuhas
Copy link
Collaborator

cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition_test.py::test_qft5 seems to be failing on the CI, for example,
https://github.com/quantumlib/Cirq/actions/runs/11335648680/job/31524196578 or
https://github.com/quantumlib/Cirq/actions/workflows/ci-daily.yml

@NoureldinYosri - can you PTAL?

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev can you run the transpiling test after the changes and the arguments introduced in #6776?

@ACE07-Sev
Copy link
Contributor

@NoureldinYosri Of course. I'll start testing now.

@ACE07-Sev
Copy link
Contributor

Good improvement. I don't know if this is what A.2 does, but it is actually quite close to qiskit with A.2 on (A.1 off).

# cirq
OrderedDict([('u3', 1)])
OrderedDict([('u3', 8), ('cx', 3)])
OrderedDict([('u3', 42), ('cx', 20)])
OrderedDict([('u3', 208), ('cx', 104)])
OrderedDict([('u3', 917), ('cx', 464)])
OrderedDict([('u3', 3846), ('cx', 1952)])
OrderedDict([('u3', 15751), ('cx', 8000)])
OrderedDict([('u3', 63752), ('cx', 32384)])
OrderedDict([('u3', 256519), ('cx', 130303)])
# Qiskit with A.2 (no A.1)
OrderedDict([('u3', 1)])
OrderedDict([('u3', 8), ('cx', 3)])
OrderedDict([('u3', 38), ('cx', 21)])
OrderedDict([('u3', 170), ('cx', 105)])
OrderedDict([('u3', 722), ('cx', 465)])
OrderedDict([('u3', 2978), ('cx', 1953)])
OrderedDict([('u3', 12098), ('cx', 8001)])
OrderedDict([('u3', 48770), ('cx', 32385)])
OrderedDict([('u3', 195842), ('cx', 130305)])

@NoureldinYosri
Copy link
Collaborator Author

@ACE07-Sev thank you! I don't know what A.2. In quantum compilation reording operations can help decrease the size of the final circuit, the objective can be depth or number of operations (or width in case of ancillas).

The problem is -I think- NP-hard (though I might be wrong) but several heuristics exist in literature ranging from simple greedy heuristics to ML models. The issue with optimize_for_target_gateset is that it didn't use any at all. My PR #6776 simply adds the simpliest possible heuristic, just sorting. We may consider adding better heuristics in the feature if this becomes an issue. for now it looks like this is good enough.

@ACE07-Sev
Copy link
Contributor

A.2 is from the Shende paper sir. I believe I mentioned A.1 and A.2 (the two optimizations qiskit's QSD does that cirq doesn't currently have). They are from the Shende paper.
https://arxiv.org/pdf/quant-ph/0406176

harry-phasecraft pushed a commit to PhaseCraft/Cirq that referenced this pull request Oct 31, 2024
This PR:
- fixes how `_single_qubit_decomposition` handles global phase and reduces the number of operations it returns when possible.
- Makes the eigendecomposition  part resilient to numerical precision issues by:
    1. resorting to the more accurate`eigh` instead of `eig` when the unitary is hermitian.
    2. if the unitary is not hermitian then use Gram–Schmidt to make sure the eigenvectors are orthonormal (mathematically they should but we could get non orthognoal vectors either because of numerical precision or because an eigenvector is repeated and as a result the the eigenvector spanning that subspace are arbitary) 
- Reduce the number of operations by removing operations equaivalent to identity (e.g. rz($\phi$) with $\phi \approx 0$)
- Adds tests for the reported failures making sure they are fixed.
- Adds tests for the corner cases of the decomposition.

---
update: also add `two_qubit_matrix_to_cz_operations` and `three_qubit_matrix_to_operations` as base cases to reduce the depth of the circuit.

---

fixes quantumlib#6666 
fixes quantumlib#6725
BichengYing pushed a commit to BichengYing/Cirq that referenced this pull request Jun 20, 2025
This PR:
- fixes how `_single_qubit_decomposition` handles global phase and reduces the number of operations it returns when possible.
- Makes the eigendecomposition  part resilient to numerical precision issues by:
    1. resorting to the more accurate`eigh` instead of `eig` when the unitary is hermitian.
    2. if the unitary is not hermitian then use Gram–Schmidt to make sure the eigenvectors are orthonormal (mathematically they should but we could get non orthognoal vectors either because of numerical precision or because an eigenvector is repeated and as a result the the eigenvector spanning that subspace are arbitary) 
- Reduce the number of operations by removing operations equaivalent to identity (e.g. rz($\phi$) with $\phi \approx 0$)
- Adds tests for the reported failures making sure they are fixed.
- Adds tests for the corner cases of the decomposition.

---
update: also add `two_qubit_matrix_to_cz_operations` and `three_qubit_matrix_to_operations` as base cases to reduce the depth of the circuit.

---

fixes quantumlib#6666 
fixes quantumlib#6725
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shannon Decomposition error when synthesizing the GHZ unitary matrix Quantum shannon decomposition fails for QFT with non-unitary matrix error
3 participants