In this article we are going to show that starting with a linear algebra picture of quantum computation, we can end up with a dynamic picture of quantum computation that involves a Hamiltonian.

Initally, these two topics may seem difficult to reconcile. Linear algebra is about vectors, lengths or vectors, angles, distances and matrices. This is the basis for classical mechanics. Hamiltonian mechanics, however, is different: It’s all about describing the energy of a system.

We will see that there are but a few relatively simple steps to move from one setting to the other. The insights gained from this transformation will prove extremely useful and by the end of the article I hope you will be comfortable with the Hamiltonian view of quantum algorithms.

What we Already Know

Hopefully your are familiar with matrices. These are the central object of study in linear algebra. We will focus our attention on square matrices – those with an equal number of rows and columns, like this one:

\displaystyle \begin{pmatrix}a_{00} & a_{01} & \cdots & a_{0n} \\ a_{10} & a_{11} & \cdots & a_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n0} & a_{n1} & \cdots & a_{nn} \end{pmatrix}

In this case, the size of a square matrix is just the number of rows (or columns) and is denoted by n. Let us also denote such a square matrix of size n by the letter A. We also say that A is an element of the square matrices of dimension n with complex entries:

\displaystyle A \in M_n(\mathbb{C})

When n is large it is well-known that matrix inversion (basically matrix division) and multiplication becomes quite difficult. However, it does help if A is sparse. All this means is that most of the elements of the matrix are zero – with the standard (although not official) definition being that the number of zeroes is greater than or equal to n. Having mostly zero entries in the matrix helps the algorithms that inverts it.

The Fundamental Problem

Let us now introduce the fundamental problem.

Consider the simplest linear interaction of a known Hermitian matrix A (we know all the entries of this matrix) with an unknown vector x (we know nothing about the entries of this vector). But now suppose we know the result of this interaction: i.e. we know about a vector b. Since we have assumed linearity, this interaction can be written down as a mathematical equation:

\displaystyle Ax = b

Our aim is to find x which solves this equation. This is our fundamental problem.

Finding b is easy when A is invertible, because then there exists another n\times n matrix, called A^{-1} (with the property that AA^{-1} = A^{-1}A = I_n) which solves the above equation. This is because simple rearrangment yields the following equation:

\displaystyle x = A^{-1}b

and we are done, i.e. we have x alone by itself on the left hand side of the equals sign.

But what if A^{-1} is hard to find? Like in the above case where A is large and not sparse?

Then maybe such a direct solution is not efficiently possible. To bypass this issue, maybe instead of trying to find the x which solves the problem, we just look for a function of x which does. I.e. why not look for an f(x) that solves Ax = b.

This is directly translatable in to the world of quantum algorithms because the x which solves the linear problem directly, is the quantum state |x\rangle and this ket is often not even measureable.

Instead, what is measureable (and thus much more useful to us anyway) is the measurement defined by M|x\rangle M^{\dagger}, which is a function of x, rather than just x itself. Note that M is a measurement operator, i.e. it is just another matrix.

When all we need to know is a function of a solution x (or equivalently, a matrix operator that acts on x) then it is often acceptable to just know an approximation of that function. This translates to only requiring an approximation of the inverse A^{-1}.

Quick side note: we assumed that A is Hermitian (also called self-adjoint) this is to appease the Dirac-von Neumann formulation of quantum mechanics that assumes all physical observables (such as position, momentum, spin, etc…) are represented by these Hermitian matrices). But this also allows for a segue in to the next topic.

Spectral Decomposition

Let us put the fundamental problem aside for a moment and think a little more about the square matrix A that we have assumed is Hermitian. Since we are working with Hermitian matrices then A always has a powerful property called a spectrum.

When a matrix has a spectrum then it can be expressed as a linear combination.

\displaystyle A = \sum_{j=1}^N \lambda_j |e_j\rangle\langle e_j|

where the \{e_1, \dots, e_N\} is an orthonormal basis.

It turns out that \{e_1, \dots, e_N\} is more than just an orthonormal basis – it’s an orthonormal eigenbasis, because they all solve the fundamental problem Ax = \lambda x. Thus, each |e_j\rangle is an eigenvector with eigenvalue \lambda_j.

Now, off course, in our case each eigenvalue \lambda_j is a real number – the spectrum of a Hermitian matrix A is contained within the real line \mathbb{R}; and we can contain the spectrum even further to just the real interval [-1,1] by forcing the Hermitian matrix to be also Unitary. But more on that later.

The Hermiticity of A, together with the aim to find a solution to the linear equation Ax = b, gives us this eigenbasis representation.

Eigenvalues, eigenvectors, eigenbasis – these are all special kinds of scalars, vectors and basis vectors that appear when we are talking about any linear transformation on a vector space, and indeed we are, specifically when we talk about the spectrum of a Hermitian matrix. Linear transformations look like our fundamental problem – the attempt to invert the square, Hermitian matrix A to find x in the linear equation Ax = b.

So why is all of this so important?

It is important because once a matrix is expressible in terms of a basis of eigenvectors (i.e. an eigenbasis) then it is immediately diagonalizable – and diagonalisble matrices are really powerful. Just look at this example:

Say you have a matrix A and you want to compute A^{100}. How would you do this quickly when the size of the matrix A is huge? It’s very tough actually! Well, don’t run away just yet. What if I told you that A was also diagonalisable? Oh, OK. That’s different. If it is diagonalisable then we can find the corresponding diagonal matrix D by simply computing

\displaystyle D = P^{-1}AP

where P is invertible and the columns of P are just the eigenvectors of A, which we already have because A can be spectrally decomposed.

Now, instead of manually computing the extremely difficult A^{100}, we just compute PD^{100}P^{-1}, and since computing D^{100} is extremely easy (you just raise the numbers on the diagonal without having to worry about any off-diagonal powers), the problem is now really easy – because diagonalisable matrices give you these easily invertible matrices \{P,P^{-1}\} for free!

This is why it is so beneficial to us to be assuming we are dealing with Hermitian matrices, because they always have this nice property that they can be decomposed in to their spectrum – that is to say: a linear combination of its eigenbasis – and this allows us to raise these Hermitian matrices to very high powers very easily, regardless of their size n.

Being able to raise matrices to high powers is very useful in quantum algorithms because you will be doing a lot of algorithm runs, generating very long sequences of vectors, where you get each subsequent vector by matrix-multiplying the previous one by A. But then v_k = v_0 A^k, and when k is large, we make use of the above diagonalisation fact to compute A^k.

Exponentiation

Let us go back to the linear combination representation (the spectrum) of our Hermitian matrix:

\displaystyle A = \sum_{j=1}^n \lambda_j |e_j\rangle\langle e_j|

Expanding the summation gives:

\displaystyle A = \lambda_1 |e_1\rangle\langle e_1| + \cdots + \lambda_n |e_n\rangle\langle e_n| =: A^{\prime}

Since each \lambda_i is real, we can form this strange-looking object:

\displaystyle e^{i\lambda_1} |e_1\rangle\langle e_1| + \cdots + e^{i\lambda_n} |e_n\rangle\langle e_n|

This clearly doesn’t equal A necessarily, because we’ve exponentiated each eigenbasis coefficient. But the point of this exercise is not to equate such an object with A, but rather to say that we can do this, this is a definition – and that this object is actually some other n\times n matrix!

So there is this other matrix, call it A^{\prime}, related to A (but not equal to it), whose spectral decomposition into the same eigenbasis (we are use the same set of basis vectors here: \{|e_j\rangle\} used to represent A, is this one whose coefficients are exponentiated.

Now, usually we don’t call this matrix A^{\prime}, and although we are free to call it anything we want, we usually denote it by the symbol `e^{i A}‘. It’s not really the exponential raised to the imaginary power of A – the whole thing is a symbol that pays homage to both the fact that the coefficients are exponentiated, and that the basis of the spectral decomposition is related to A.

So we have this object:

\displaystyle e^{i A} := e^{i\lambda_1} |e_1\rangle\langle e_1| + \cdots + e^{i\lambda_n} |e_n\rangle\langle e_n|

If we reverse the spectral decomposition then the symbol e^{i A} still describes an n \times n matrix. But now the elements of this matrix have an imaginary component by virtue of our definition which took the original eigenvalue coefficient \lambda and exponentiated it to e^{i\lambda}. This means, crucially, that we start to have some interesting relationships between the elements of the matrix e^{i \lambda}.

For example, elements on the left-side of a product (which used to be real for the Hermitian matrix A) are now complex and thus we need to remember the conjugation! This gives multiplication results like:

\displaystyle (e^{i\lambda_j})^{\ast}e^{i \lambda_j} = e^{-i\lambda_j}e^{i\lambda_j} = e^{i\lambda_j - i\lambda_j} = e^0 = 1

which holds for all elements where the indices coincide: j = k. And when the terms are distinct, i.e. when j \neq k, we get full cancellation due to the usual result that |e_j\rangle\langle e_k| = 0 .

This is actually the definition of a unitary matrix! Thus e^{i A} is unitary.

But wait a second…if you think about this for a minute, you’ll realise that this construction always results in a unitary matrix. This is actually the result of a theorem which states that every unitary operator arises as e^{iA} for some Hermitian operator A. Let’s quickly prove this:

Proof

Let

 V := \frac{1}{2}(U + U^{\dagger}),\quad\quad W := \frac{1}{2i}(U - U^{\dagger})

then
 V + iW = \frac{1}{2}(U + U^{\dagger}) + \frac{i}{2i}(U - U^{\dagger})
 V + iW = \frac{1}{2}U + \frac{1}{2}U +\frac{1}{2}U - \frac{1}{2}U^{\dagger}
 V + iW = U

hence both V and W are real. Now that we know this detail, it implies the following two properties:

 V = V^{\dagger},\quad\quad W = W^{\dagger}
 VW = WV

Therefore, not only do V and W have a spectral decomposition (like A did above) but they also have ones that share the same basis (like how both A and e^{iA} did above). In other words, we can write

\displaystyle V = \lambda_1 |e_1\rangle\langle e_1| + \cdots + \lambda_N |e_N\rangle\langle e_N|
\displaystyle W = \xi_1 |e_1\rangle\langle e_1| + \cdots + \xi_N|e_N\rangle\langle e_N|

which gives us something to decompose V + iW in to:

\displaystyle V + iW = (\lambda_1 + i\xi_1)|e_1\rangle\langle e_1| + \cdots + (\lambda_N + i \xi_N)|e_N\rangle\langle e_N|

But since we have already shown that V + iW = U we must have that each coefficient (\lambda_j + i\xi_j) is an eigenvalue of U, and since U is also unitary; and all these eigenvalues have unit norm (a main ingredient we will use in a moment to be able to invoke the geometrical picture of a circle in the plane with radius 1).

It follows from this that each separate eigenvalue \lambda_j and \xi_j are real numbers with the property that

\displaystyle |\lambda_j|^2 + |\xi_j|^2 = 1

Now we have a unit circle! Then for any j, the above equation describes a point somewhere on the circumference of a unit circle.

But now this provides a shortcut!

Because instead of having to keep track of two parameters (\lambda_j, \xi_j) for all j, we can instead just consider the angle between the line drawn from the point on the circle to the origin and the horizontal. This is always a unique angle (on the unit circle) denoted by \theta_j such that \lambda_j = \cos\theta_j and \xi_j = \sin\theta_j.

We thus have

\displaystyle (\lambda_j + i\xi_j) = e^{i\theta_j}

On the left hand side is a unit complex number z_j = \lambda_j + i\xi_j with real part \mathfrak{Re}z_j = \lambda_j and imaginary part \mathfrak{Im}z_j = \xi_j; and on the right-hand side is the phase of the complex number (note: the magnitude |z_j| = 1).

Now we just reverse the operation of exponentiating that we did in the first place and transform these coefficents like this: e^{i\theta_j} \rightarrow \theta_j remembering that this is the reverse operation we did initially to do this: \lambda_j \rightarrow e^{-i\lambda_j}.

This gives us a new spectral decomposition (in the same eigenbasis) with new transformed coefficients:

\displaystyle A = \theta_1|e_1\rangle\langle e_1| + \cdots + \theta_N|e_N\rangle\langle_N|

Q.E.D.

Introducing Dynamics

To introduce some dynamics in to the picture let us define a parameter t \in \mathbb{N} whose job it is to track the evolution steps of some algorithm, or just to track time itself. The dynamical evolution of a quantity A would be to just let it evolve over and over again, say, t times so that after that amount of time we end up with tA.

This evolved quantity has the same orthonormal eigenbasis \{e_j\} but now with time-scaled eigenvalues t\lambda_j, which gives a spectral decomposition of

\displaystyle e^{itA} = e^{it\lambda_1}|e_1\rangle\langle e_1| + \cdots + e^{it\lambda_N}|e_N\rangle\langle e_N|

But according to the above proof we have the relationship that this is just a unitary matrix since e^{iA} = U, we must have: e^{itA} = (e^{iA})^t = U^t giving:

\displaystyle U^t = e^{it\lambda_1}|e_1\rangle\langle e_1| + \cdots + e^{it\lambda_N}|e_N\rangle\langle e_N|

Since this holds for all t \in \mathbb{N} we can differentiate with respect to time t to get

\displaystyle \frac{d}{dt} U^t = \frac{d}{dt}e^{it\lambda_1}|e_1\rangle\langle e_1| + \cdots + \frac{d}{dt}e^{it\lambda_N}|e_N\rangle\langle e_N|

Or, in summation notation:

\displaystyle \frac{d}{dt} U^t = \sum_{j=1}^{N}\left(\frac{d}{dt}e^{it\lambda_j}\right)|e_j\rangle\langle e_j|

Now, since we have this additive identity to play with:

\displaystyle 0 = i\lambda_k e^{it\lambda_k} |e_j\rangle\langle e_k|,\quad\quad\forall\, j\neq k

…we can insert a bunch of zeroes in to the expansion of \frac{d}{dt}U^t.

First, let’s expand the differential again…

\displaystyle \frac{d}{dt} U^t = i\lambda_1 e^{it\lambda_1}|e_1\rangle\langle e_1| + \cdots + i\lambda_N e^{it\lambda_N}|e_N\rangle\langle e_N|

…and then insert a bunch of zeroes:

\displaystyle \frac{d}{dt} U^t = i\lambda_1 e^{it\lambda_1}|e_1\rangle\langle e_1| + 0 + 0 + \cdots + 0 + 0 + i\lambda_N e^{it\lambda_N}|e_N\rangle\langle e_N|

This might look a little weird but we will make good use of these extra zeroes!

For each zero, insert the identity but just making sure the i‘s and the j‘s never coincide

\displaystyle \frac{d}{dt} U^t = i\lambda_1e^{it\lambda_1}|e_1\rangle\langle e_1| + i\lambda_1 e^{it\lambda_2}|e_1\rangle\langle e_2| + \cdots +
\displaystyle + i\lambda_{N-1} e^{it\lambda_N}|e_{N-1}\rangle\langle e_N| + i\lambda_N e^{it\lambda_N}|e_N\rangle\langle e_N| = \sum_{j=1}^n i\lambda_j e^{it\lambda_j}|e_j\rangle\langle e_j|

OK, so now we have the same expression, except now we have quite a few more terms which are identically zero, but in their current form allows us to factor out a summation.

Let us try to factor out iA from this equation.

Since iA = \sum_{j=1}^{N}i\lambda_j|e_j\rangle\langle e_j| we have :

\displaystyle \sum_{j=1}^n i\lambda_j e^{it\lambda_j}|e_j\rangle\langle e_j| = da_j|e_j\rangle\langle e_j| \left( \sum_{k=1}^n e^{it\lambda_k} |e_k\rangle\langle e_k| \right)

Thus

\displaystyle \frac{d}{dt} U^t = iAU^t

…our main result for this article!

Forming the Time-Dependent Schrodinger Equation

A quick recap: Start with a Hermitian matrix A and consider its spectral decomposition. This will be a linear combination of eigenbasis vectors with eigenvalue coefficients \lambda_j (eigen – because we are assuming we have the linear eigenvector-eigenvalue problem). We then take the coefficients and transform them into e^{i\lambda_j}. Undo the decomposition and this forms a new matrix with symbol e^{iA}. This new matrix is unitary, and we now denote it U (then we proved this), thus U = e^{iA}. Then we looked at what happened when we multiplied the Hermitian matrix A by a real number t forming the matrix tA. This scalar multiplication turns in to exponentiation: tA = U^t. Then we differentiated this with respect to t and found that time-dependent differential is d/dt (U^t) =  iAU^t.

Using this last fact, let as apply the left-hand side to some initial quantum state ket |x_0\rangle. We get

\displaystyle \frac{d}{dt}U^t |x_0\rangle = iAU^t|x_0\rangle

This must hold, since we found that the things operating on the left-hand side of |x_0\rangle on both sides of the equals sign are the same thing.

But we also know that when U^t operates on some ket |x_0\rangle, then this just results in the time-evolution of the ket to a future time state |x(t)\rangle. Therefore, the above equation can be simplified to:

\displaystyle \frac{d}{dt}|x(t)\rangle = iA|x(t)\rangle

If we now define A^{\prime} := -\hbar A (because we can! or maybe because some experiment told us to), then substituting this into the above equation and rearranging slightly gives us

\displaystyle i\hbar\frac{d}{dt}|x(t)\rangle = A^{\prime}|x(t)\rangle

which is precisely the time-dependent Schrodinger equation, where normally one would see A^{\prime} = H the (observable) Hamiltonian operator, and x = \Psi, so that one has instead:

\displaystyle i\hbar\frac{d}{dt}|\Psi(t)\rangle = H|\Psi(t)\rangle

…which is more familiar to us, but identical.

The Schrodinger Equation describes the evolution of an isolated quantum state |\Psi\rangle in continuous time t, but also requires the Planck constant \hbar to be able to convert between units of time and units of energy (hence why the Planck constant is in units of Joules per Hertz).

Conclusion

Again, this whole derivation is contingent on the eigenvector-eigenvalue problem, i.e. the need to solve the linear equation:

\displaystyle Ax = bx

I’ve written a previous article on why this problem is so fundamental. I recommend reading it if this motivation origin does not sit right with you. Before moving on, we noted at this point that it might be more efficient to look for a function of A, and gave some reasons for this. As it turns out, the result we get may be interpreted as such, as we will see below.

Then, we needed the first matrix A to be Hermitian because we needed all real eigenvalues, and also (probably more importantly) we needed A to be diagonalisable. We then exponentiate the Hermitian matrix to form the new matrix e^{iA}. We then saw that this exponentiated Hermitian – which is now Unitary – can be applied to a quantum state and does nothing to it

\displaystyle U^t |x_j\rangle = e^{-iAt}|x_j\rangle = e^{-i\lambda_j t}|x_j\rangle

except for pick up a global phase from it – which is OK because the global phase doesn’t impact measurements.

Since we can write an arbitrary quantum state as a superposition over the (energy) eigenstates

\displaystyle |\Psi\rangle = \alpha_1 |e_1\rangle + \cdots + \alpha_n |e_n\rangle

from this perspective, applying the exponentiated Hermitian Unitary gives

\displaystyle e^{-iAt}|\Psi\rangle = \alpha_1 e^{-i\lambda_1 t}|e_1\rangle + \cdots + \alpha_n e^{-i\lambda_n t} |e_n\rangle

and I have seen this described as energy being defined as the speed at which a quantum state picks up a phase – although not obvious to me at this point, but it does sound interesting. The reason why exponentiation helps us so much here is due to a technical result known as Stone’s Theorem (proved by Marshall Stone and John von Neumann in 1932) which is out of scope of this article, but it extremely important and useful.

References

[1] Lipton & Regan, Introduction to Quantum Algorithms via Linear Algebra, 2nd Edition (2021)

[2] Nielsen & Chuang, Quantum Computation and Quantum Information, 10th Anniversary Edtion (2016)

[3] Scott Aaronson’s Lecture Notes (E.g. Lecture 25)