Towards a New Way of Benchmarking Quantum Devices

Some of the largest quantum devices today have reached the scale of more than 50 qubits, which heralds a new era with quantum computers whose quantum states may be intractable to simulate by any classical means. If that happens, how do we know that the quantum computer is doing what we think it should be doing? In a typical approach, the confidence that any algorithm will be executed with high fidelity is extrapolated from the accuracy of its constituent single-qubit and two-qubit gates.

For near-term devices with noisy qubits, however, this vision turns out to work only partially in practice. Quantum errors are devious and diverse in nature. Sometimes they destructively interfere, canceling each other out making the theoretical prediction overly pessimistic. Sometimes they act on multiple qubits and are beyond existing benchmarking techniques. They can also change over time, making it ever so difficult to keep track of the impact of error in a quantum circuit. These aspects render many of the current benchmarking techniques unscalable with respect to the number of qubits. A quantum computer with 50+ qubits, for example, is already out of reach for these techniques. Beyond 50-60 qubits, the cost of simulating the experiments classically becomes prohibitive, and it is unlikely that we will have classical simulations available as a comparison. How can we find out if it is possible to extract useful computations out of devices beyond supremacy that cannot be simulated?

We firmly believe that as physical qubits and quantum error correction technology develops and matures, the challenges posed by noise and error will be gradually and systematically mitigated over time. At Zapata, we aim at pushing forward what can be done with quantum devices today. What that means is that on one hand, we cannot grant ourselves the naivety to ignore the above-mentioned subtleties with quantum error on physical hardware. On the other hand, we are developing algorithms tailored for specific applications such as quantum simulation. When these two aspects come together, it seems clear that we need a new way of benchmarking quantum devices that is both scalable with respect to the number of qubits and relevant to the application problems that we are interested in solving.

This brings us to the recent paper that we released on the arXiv, where we propose an example that embodies this new way of benchmarking quantum devices (“application benchmark”). Here we focus on fermionic simulation, which for many good reasons is a promising application of quantum computers. Using the Sycamore quantum processor produced by Google as an example, we aim at using the native ability of a quantum device for solving an exact model (1D Fermi-Hubbard model) whose analytical solution is known. The 1D Fermi-Hubbard model is a prototype for systems with correlated electrons. For the purpose of benchmarking quantum algorithms, it is also interesting that the model sits at the frontier of what can be simulated efficiently with a classical computer, as it gives us access to complexity knobs that can easily be tuned. For example, the interaction can be set to zero to obtain a very easy classical problem, but we can also glue the chain in a 2D structure and obtain a difficult quantum problem.

The particular metric of interest in our application benchmark is what we call the effective fermionic length of the device. This metric tells us the longest chain of the 1D that can be simulated in the device before noise dominates. It is a quantity that can be directly measured on a given quantum device, and we show that it can be efficiently measured even for cases where the number of qubits far exceeds what can be simulated classically (e.g., hundreds of qubits). We chose the 1D Fermi-Hubbard model because it is exactly solvable but also describes some of the physics of correlated systems and is thus representative of chemistry and materials. In other words, it provides a sense of how big of a fermionic system can be simulated on a quantum device. This metric, like quantum volume, is global and reflective of the entire device, including its noise channels, qubit connectivity, etc.

There are many interesting technical details to be discussed regarding fermionic length benchmarks in general as well as the proposal described in this paper. These details often change from one hardware system (superconducting qubits, trapped ions, etc.) to another. Together with partners in the quantum ecosystem, we look forward to diving deeper into this new territory of application benchmarking.

We will be hosting several videoconferences to go deeper on our methods, what we learned, and how we used our platform, Orquestra^®, to scale the experiment. For anyone interested in learning more, please reach out to us.