Why Generative AI Is Leading the Race to Practical Quantum Advantage
Why Generative AI Is Leading the Race to Practical Quantum Advantage
Over the last 15 years, AI and machine learning (ML) went from speculative research to some of the biggest buzzwords in digital transformation. Today, most enterprises have some ML-powered solutions in place, with applications including fraud detection, automated chat bots, financial risk prediction, cancer detection, and many more.
However, even though ML hasn’t been around for very long, we are already on the cusp of the next evolution of ML: quantum machine learning (QML).
In short, QML incorporates quantum computing to expand what’s possible with classical ML. For a longer, more detailed explanation of how quantum computing works and how it can be used for ML tasks, check out the video below. The most important points of the video will also be discussed in further detail in this blog post.
Before we continue, it’s important to note that when we say “QML,” we’re referring specifically to quantum generative modeling, also known as generative AI. In our work, we have focused our efforts on quantum generative modeling because our research has suggested that it is a very good candidate for achieving quantum advantage.
Generative modeling refers to the models that generate synthetic data that falls within the probability distribution of some existing data of interest. A good example of generative modeling is the famous This Person Does Not Exist website, which uses a classical generative model known as a Generative Adversarial Network (GAN) to generate synthetic images of faces that look like real faces (in a fun twist, whichfaceisreal.com challenges you to pick which face is real and which was generated by AI). The video below by Luis provides a good explanation of how a GAN works.
As of today, generative modeling is a very promising near-term approach for QML. We get into the reasons for this in the video above, but generally this is because implementing the discriminative models (models that discriminate between different types of data) that classical computers excel at is much less complex than implementing generative models. It’s the difference between deciding if a painting is a Van Gogh or not and generating a painting that could have convincingly been made by Van Gogh — the latter is much more difficult. For more on this distinction, check out this excellent Google developers blog post.
Whereas classical computers are great at making the input-output calculations necessary for discriminative models, they aren’t as good at learning complex probability distributions required for generative modeling. These probability distributions are essential to generating novel samples, and quantum computers have the potential to excel at learning these distributions and generating data from them due to the inherent probabilistic nature associated with measuring a quantum state. This makes generative models a prime candidate for achieving quantum advantage.
As This Person Does Not Exist makes clear, classical computers are certainly competent when it comes to generative modeling. But that’s not to say they can’t be improved with quantum methods. In fact, we’ve already done promising research that demonstrates how quantum hardware could augment a classical GAN to generate high-resolution images of handwritten digits better than the classical model alone — the first successful example of synergy between quantum and classical methods for ML.
When it comes to generative modeling, quantum computing has the potential to outperform or boost classical ML in three key areas: expressibility, generalization, and trainability. Let’s explore these concepts in more detail.
There is theoretical research to suggest that quantum neural networks can encode probability distributions that are not tractable classically. This points to the potential for expressibility, or the ability of a model to implement complex distributions, to be a path towards a practical quantum advantage in ML.
Another way of thinking about expressibility is the ability to train new models that capture the full range of possibilities for the data of interest. To give an example, let’s consider This Person Does Not Exist again. If this algorithm had poor expressibility, it might only include models which generate faces with eyes a certain distance apart, or with full lips, or with glasses. In other words, it might not have access to models that generate the full range of possibilities in human faces.
As complex as human faces are, they are relatively simple compared to other kinds of data. After all, a picture of a face can only have so many pixels, and faces can only be so different from one another. Clearly, classical GANs like the ones used by This Person Does Not Exist have no problem with expressibility. Let’s consider a more complex problem — a problem with enterprise value.
Say you wanted a model that could generate investment portfolios with high returns and minimal risk, made up of 50 out of the 500 companies on the S&P 500. The number of possible combinations exceeds the number of atoms in the Milky Way, and there are countless variables that influence the values of different stocks. What’s more, the values of certain stocks will be correlated.
With this level of complexity, available classical models might not be able to capture the necessary correlations to generate the best portfolios — especially when it’s not just neighboring data entries that are correlated, as in the case of pixels inside images of human faces. On the other hand, a quantum or quantum-inspired model could encode hidden correlations that a classical computer would struggle to find. That’s because in a quantum model, quantum bits (also known as qubits) can be entangled and thus correlate information throughout the entire model. This allows QML models to generate data outputs from probability distributions that are not accessible with classical models.
This S&P 500 problem isn’t hypothetical — we’ve already shown how quantum-inspired models can add value here. It’s the first demonstration of our Generator-Enhanced Optimization (GEO) strategy, which uses generative modeling to solve optimization problems, such as finding an optimal investment portfolio.
In our research, we showed how a quantum-inspired generative model could improve on results generated by a classical solver in a portfolio optimization problem. The quantum-inspired model generated candidate portfolios with lower risk than those generated by the purely classical solver, for the same level of return. In other words, our quantum-inspired GEO strategy was on par with state-of-the-art classical solvers that have been fine-tuned for decades.
This research was the first demonstration of generalization in quantum-inspired generative models, the next advantage for QML we’ll discuss here.
Generalization is another important benchmark for measuring the quality of generative algorithms. At its core, generalization measures a model’s ability to generate data that is novel, high-quality, and a valid solution to the problem in question. In other words, is the model learning or is it just memorizing? To understand the distinction, think about studying for an exam. If you just memorize the questions in the textbook and can’t generalize what you’ve learned, you won’t be able to answer new questions on the exam that cover the same material.
Generalization is extremely important for enterprise use cases for generative modeling. For example, consider a model designed to generate new cancer treatments. The model would learn from a database of existing drugs to generate new drugs. With poor generalization, it would simply recreate the drugs that it was trained on, without understanding the essential features of these drugs that make them work. Of course, you would want a model that can generate working drugs that don’t already exist — that’s where the value is.
Generalization is closely related to expressibility. Several papers have pointed out that too much expressibility can lead to poor generalization. For example, when you overparameterize a model you get large expressibility, but it might be more difficult to train the model and it might tend to memorize the data rather than generalize from it. This is why, in classical ML literature, techniques such as dropout and batch normalization were introduced to regularize deep neural networks, or limit their expressivity, which in turn improved their generalization behavior. Because ultimately, generalization is more important than expressibility; our goal is to generate new and useful data.
The important question, of course, is whether quantum methods can improve upon the generalization capabilities of classical methods. Unfortunately, it’s too early to say definitively, and to date, little work has been done to understand generalization in quantum generative modeling. However, at Zapata, we have developed evaluation metrics to quantify generalization in QML models, which allows us to compare the performance of different classes of generative models. For example, one of these metrics, fidelity, distinguishes whether the samples generated are noise or valid data that solves the problem at hand. With this framework, generalization in generative models is now a concrete and quantitative path to near-term quantum advantage.
Using these metrics, our team’s research has shown for the first time that quantum generative models, particularly a class known as quantum circuit born machines (QCBMs), can generalize. Furthermore, the research demonstrated that QCBMs can generate samples with higher quality than the samples in the training set.
The ability to generate higher quality samples has major implications for solving combinatorial optimization problems, such as the portfolio optimization problem from earlier. A QCBM could be swapped in for the quantum-inspired model used in our GEO strategy, which we showed could improve upon results from classical solvers. In other words, as QCBMs become more powerful, they could be an asset in the race for practical quantum advantage in combinatorial optimization problems.
For a deeper, more technical dive into our research on generalization in QCBMs, watch the video below:
To fully exploit the benefits of quantum computing for generative ML, we need to be able to train these models well. Trainability refers to the ability to tune the parameters of a model to reliably and accurately fit the data of interest. If we can’t train a model to generate data that aligns with the training data, even though it has the expressibility to encode it, then it has poor trainability, and is of little use. Ultimately, you can’t have expressibility and generalization without trainability.
Quantum circuits are known to have many training issues, including barren plateaus. To understand barren plateaus, imagine that training a model is like trying to climb Mount Everest. Your goal is to climb the highest mountain on earth, but you don’t know where it’s located. What are the odds that if you randomly start anywhere in the world, you will find your way to the summit? It is extremely unlikely; if you start randomly ascending you are more likely to end up on a plateau somewhere without a clue about which direction to walk to get closer to Everest. This is where the training gets stuck. Modifying the parameters in any direction doesn’t yield any significant changes in the output, so the model doesn’t know in which direction to “walk towards” to improve its results.
But what if you know the continent? The country? The closer your starting location is to Everest, the more likely you are to make it to the top. Similarly, we hypothesized that if you can start training your model from a good place using classical methods, you can improve the subsequent training of the quantum model.
In our most recent QML research, we have been able to show how combining classical and quantum methods can improve the training of these quantum models, overcoming barren plateaus to outperform both classical and quantum methods in isolation.
In our research, we used a classical model called a matrix product state (MPS), which is a quantum-inspired model that is run in a classical computer (this is the same model used in our GEO demonstration). These MPS could in principle faithfully model any quantum state, with the price that you might need exponentially more parameters in the MPS if the target quantum state is too complicated. However, MPSs are much easier to train than quantum circuits, but they can run out of computational resources. In contrast, quantum circuits can reach a much more diverse set of quantum states with non-exponential resources, but are much harder to train (e.g., due to barren plateaus).
Our approach was to first train an MPS as much as we can, then map it into a quantum circuit, extend the quantum circuit with additional quantum gates that would not be feasible to execute on classical computers, and continue training on a quantum computer.
Why does this work? Because the MPS can handle local correlations (e.g., correlations between neighboring pixels in an image), and the quantum circuit is much better for expressing distant correlations (e.g., correlations between distant pixels in an image). The MPS trains all those close correlations, and once it reaches the limits of its computational resources, it passes a relatively well-trained model to the quantum circuit. The quantum circuit then continues the training with the distant correlations.
To give an analogy, imagine the quantum circuit is the wings of an airplane, and the MPS is the wheels. Obviously, an airplane can’t start flying immediately from rest. We use the wheels to gain speed until the plane can start using the wings. The wings can take the airplane to places that the wheels alone couldn’t possibly carry it. But it’s the combination of wheels+wings which helps the airplane get to its destination.
In the end, the question is not whether quantum can outperform classical computing, but rather how it can boost classical computing.
In the same way, we’re not trying to use only quantum circuits for everything. In the end, the question is not whether quantum can outperform classical computing, but rather how it can boost classical computing. The synergy between classical and quantum is what will ultimately expand what’s possible for generative modeling, and likely other domains with high computational complexity as well.
The research we’ve done so far gives us hope that generative modeling will be the earliest use case where quantum computing can generate near-term value in tandem with classical methods. Ultimately, however, we won’t know until we get down to business and build applications for real-world problems. But the methods, models and metrics our team is building will enable us to create these applications and evaluate their performance.