Legendre Transformation

From TorontoMathWiki

Jump to: navigation, search

The Legendre transformation is an operation that transforms one convex, differentiable real-valued function of a real variable into another. It can be generalized to the Legendre-Fenchel transformation, which is an involution (a map that is its own inverse) on the set of closed, convex and finite functions. The Legendre-Fenchel transformation relates smoothness (resp. strict convexity) of one function to strict convexity (resp. smoothness) of its transform. It is equivalent to the Legendre transformation on the set of strictly convex, differentiable, 1-coercive functions. The Legendre transform of $\ f(x) $ is especially well-behaved when $\ f \in C^2 $ is strictly convex. In this case, the Legendre transformation can be interpreted as a 1-1 correspondence between the graph of $\ f $ and the family of tangent lines to the graph of $\ f $. The Legendre transformation has several applications in applied mathematics, one of which is the derivation of the Hamiltonian formulation of classical mechanics from the Lagrangian one.



In classical real analysis, an object of interest is the inverse map of the gradient of a differentiable function $ f: \mathbb{R}^n \rightarrow \mathbb{R} $. Sufficient conditions for the existence of the map $ ( \nabla f)^{-1}$ in a neighbourhood $S \subset \mathbb{R}^n $ of a point $ p_0 = \nabla f(x_0)$, where $x_0 \in \mathbb{R}^n $, are that the Hessian matrix $ \nabla^2 f$ is continuous and nonsingular at $x_0$; this follows from the inverse function theorem. Let's assume that $ (\nabla f)^{-1} $ exists. Recall that the affine hyperplane in $ \mathbb{R}^n \times \mathbb{R} $ associated with $ (s,r) \in \mathbb{R}^n \times \mathbb{R} $ is the set $ H_{s,r} := \{ y \in \mathbb{R}^n : \langle s,y \rangle = r \} $ , where $ \langle \cdot , \cdot \rangle $ denotes the standard inner product on $ \mathbb{R}^n $. Geometrically, given $p \in S$, we want to find $x$ such that the affine hyperplane orthogonal $(p, -1) \in \mathbb{R}^n \times \mathbb{R}$ and passing through $(x, f(x))$ is tangent to the graph of $f$ at $x$. When this $x$ exists and is unique, then we can construct a map $x = x(p) = (\nabla f)^{-1} (p)$ that is itself a gradient mapping i.e. there is a function $f^*: S \rightarrow \mathbb{R}$ such that $x( \cdot ) = \nabla f^*$. Thus, we have the correspondence

$ p = \nabla f(x) \qquad \iff \qquad x = \nabla f^*(p) \qquad $ (1.1)

The function $ f^* $, when it exists and is well-defined, is called the Legendre transform of $f$. Computing $f^*$ gives us the following expression:

$ S \ni p \mapsto f^*(p) = \langle p, x(p) \rangle - f(x(p)) \qquad $ (1.2)

We justify this formula with the following calculation:

Recall that the differential $dh$ of a function $h: \mathbb{R}^n \rightarrow \mathbb{R} $ is given by $ dh = \langle \nabla h(x), dx \rangle $. Thus, it follows from (1.2) that

$ \begin{align} d(f^*) & = \langle p, dx \rangle + \langle dp, x \rangle - \langle \nabla f(x), dx \rangle \\ & = \langle p, dx \rangle + \langle dp, x \rangle - \langle p, dx \rangle \\ & = \langle dp, x \rangle \end{align} $

This last expression defines $x$ as the gradient of $f^*$ with respect to $p$, so that $ \nabla f^* = ( \nabla f )^{-1} $. We are led to the following:

Definition 1.1: When $\nabla f $ is invertible, the function $f^*: \mathbb{R}^n \rightarrow \mathbb{R}$ defined by the expression

$ p \mapsto f^*(p) = \langle p, ( \nabla f )^{-1} (p) \rangle - f ( ( \nabla f)^{-1} (p) ) \qquad $ (1.3)

is called the Legendre transform of $f$.

Convex analysis provides a method for extending the notion of the Legendre transform to functions that do not satisfy the rather restrictive condition that the gradient be invertible. To see this, first assume that $ f: \mathbb{R}^n \rightarrow \mathbb{R} $ is convex and differentiable. Recall that the subdifferential $ \partial f(x) $ of $f$ at $x$ is the set of all vectors $ p \in \mathbb{R}^n $ such that

$ f(y) \ge f(x) + \langle p, y - x \rangle , \quad \forall y \in \mathbb{R}^n \qquad $ (1.4)

We consider the set-valued mapping $ x \mapsto \partial f(x) $ instead of $ x \mapsto \nabla f(x) $. To invert this map means that given $p$, find $x = x(p)$ such that $ p \in \partial f(x) $; note that $x$ is not necessarily unique. By subtracting $ \langle p, y \rangle $ from either side of (1.4), we see that this choice of $x$ minimizes $ f( \cdot ) - \langle p, \cdot \rangle $ over $ \mathbb{R}^n $. Thus, $x(p)$ is a solution of the maximization problem

$ \sup \{ \langle p, x \rangle - f(x) : x \in \mathbb{R}^n \} \qquad $ (1.5)

The Legendre transform is well-defined when this problem has a unique solution. In fact, (1.5) is a possible way of defining the Legendre transform of $f$ unambiguously when $f$ satisfies the following properties:

i) differentiability - so $ \nabla f $ is defined on $ \mathbb{R}^n $;

ii) strict convexity - so (1.5) has a unique solution;

iii) $ \nabla f ( \mathbb{R}^n ) = \mathbb{R}^n $ - so (1.5) has a solution for all $p \in \mathbb{R}^n $; this condition means that when $ ||x|| \rightarrow \infty $, f(x) increases faster than any linear function of $x$. $f$ is said to be 1-coercive.

(See Corollary 4.4.) In the general case, however, the Legendre transform of $f$ is not well-defined. In this case, we adopt the following definition:

Definition 1.2: The Legendre-Fenchel transform or conjugate of a function $f$ is the function $f^*$ defined by

$ \mathbb{R}^n \ni p \mapsto f^*(p) = \sup \{ \langle p, x \rangle - f(x) : x$ in the domain of $f$ and $f(x) < + \infty \} $

Remark 1.3: In Definition 1.2, $f$ does not need to be convex or differentiable. To avoid the situations where $f^* \equiv + \infty$ or $f^* \equiv - \infty$, it is assumed that $f: \mathbb{R}^n \rightarrow \mathbb{R} \cup \{ + \infty \}$, $ f \not\equiv +\infty $, and that there is an affine function $g$ such that $ g(x) \le f(x) $ for all $x \in \mathbb{R}^n$. The last assumption implies that $f(x) > - \infty $ for all $x$.

When $f$ is twice continuously differentiable and $ \nabla^2 f$ is positive definite, then the conjugate of $f$ is just the Legendre transform.

Convexity and Involutivity

Let $ f: \mathbb{R}^n \rightarrow \mathbb{R} \cup \{ + \infty \} $ satisfy the assumptions in Remark 1.3. In order to discuss some important properties of the Legendre-Fenchel transform, we need a few notions from convex analysis:

  • The epigraph of $f$ is defined to be the nonempty set epi $f := \{ (x, r) \in \mathbb{R}^n \times \mathbb{R} : r \ge f(x) \}$. $f$ is said to be closed if its epigraph is a closed subset of $ \mathbb{R}^n \times \mathbb{R} $. Also, $f$ is convex if and only if its epigraph is a convex subset of $ \mathbb{R}^n \times \mathbb{R} $.
  • The effective domain of $f$ is defined to be the nonempty set dom $f := \{ x \in \mathbb{R}^n : f(x) < + \infty \} $. If $f$ is convex, then dom $f$ is a convex subset of $ \mathbb{R}^n $.

The claims made in the above two statements can be proven directly from the definitions of epigraph, effective domain, and convexity of functions and sets.

Proposition 2.1: The conjugate function $f^*$ is closed and convex (regardless of whether or not $f$ is).

Proof: By definition, $f^*$ is the pointwise supremum over all $x \in \mathbb{R}^n $ of affine functions $\{ g_x(p) \}_x $ of $p$, where $ g_x(p) = \langle p, x \rangle - f(x) $. Thus, epi $f^* = \bigcap_{x \in \mathbb{R}^n} $ epi $g_x$. But epi $g_x$ is a closed half-space in $\mathbb{R}^n \times \mathbb{R}$, so epi $f^*$ is closed and convex because it is an intersection of closed, convex sets. Hence, $f^*$ is closed and convex.

From convex analysis, we know that for any convex function $g$ on $ \mathbb{R}^n $, there is an affine function $h$ such that $h \le g$ on $ \mathbb{R}^n $. Since $f$ satisfies the assumptions in Remark 1.3, and $f^*$ is convex, then $f^*$ also satisfies the assumptions in Remark 1.3. Thus, we can define the biconjugate function of $f$: for all $x \in \mathbb{R}^n $,

$ f^{**} (x) = \sup \{ \langle p, x \rangle - f^*(p) : p \in \text{dom} \: f^* \} \qquad $ (2.1)

$f^{**}$ is the largest convex function that does not exceed $f$. This fact is proven in the following theorem about the biconjugate:

Theorem 2.2 (Fenchel-Moreau):

(i) $f^{**}$ is convex and $f^{**} \le f$.

(ii) If $f^{**} = f$, then $f$ is a closed and convex function. The converse holds if $f$ is also finite on $ \mathbb{R}^n$.

Proof: $f^{**}$ is closed and convex by Proposition 2.1, since $f^{**}$ is the conjugate of $f^*$. For any $x \in \mathbb{R}^n $ and for any $p \in $ dom $f^*$,

$ \begin{align} \langle p, x \rangle - f^*(p) & \le \langle p, x \rangle - ( \langle p,x \rangle - f(x) ) \\ & = f(x) \end{align} $

Taking the supremum over all $p \in $ dom $f^*$ and using the definition of $f^{**}$, it follows that $ f^{**}(x) \le f(x) $ for all $x$. This proves (i). To prove (ii), it suffices to show that if $f$ is closed, convex and finite, then $ f \le f^{**} $. Fix any $x \in \mathbb{R}^n$. Assuming that $f$ is closed, convex and finite, the subdifferential $ \partial f(x) $ of $f$ at $x$ is nonempty. Let $p \in \partial f(x) $, so that the inequality (1.4) holds for all $y \in \mathbb{R}^n$. Next, choose $y$ such that $f^*(p) = \langle p, y \rangle - f(y)$. Then (1.4) implies that

$ - f^*(p) \ge f(x) - \langle p, x \rangle \qquad \Rightarrow \qquad f(x) \le \langle p, x \rangle - f^*(p) \le f^{**}(x) $

The Fenchel-Moreau theorem implies that the Legendre-Fenchel transformation (or conjugation) is an involution on the set of closed, convex and finite functions.

Young’s Inequality, Convexity and Smoothness

Let $ f: \mathbb{R}^n \rightarrow \mathbb{R} \cup \{ + \infty \} $ satisfy the assumptions in Remark 1.3. A consequence of the definition of $f^*$ is Young's inequality:

$ f(x) + f^*(p) \ge \langle x, p \rangle \quad $ for all $x, p \in \mathbb{R}^n \qquad$ (3.1)

The following proposition states the precise condition for which (3.1) is an equality, and it involves the notion of the subdifferential of $f$:

Proposition 3.1: Let $x \in $ dom $f$. Then $p \in \partial f(x)$ if and only if (3.1) is an equality.


$ \begin{align} p \in \partial f(x) & \iff \langle p, y \rangle - f(y) \le \langle p, x \rangle - f(x) \quad \forall y \in \text{dom} \: f \\ & \iff f^*(p) \le \langle p, x \rangle - f(x) \\ & \iff f^*(p) = \langle p, x \rangle - f(x) \quad \text{by definition of} \: f^* \\ \end{align} $

This completes the proof. Note that $p \in \partial f(x)$ implies that $p \in $ dom $f^*$, since $f(x) > - \infty$. For closed convex functions, the condition for equality in (3.1) can be restated:

Corollary 3.2: If $f$ is a closed, convex and finite function, then

$ f(x) + f^*(p) = \langle p, x \rangle \qquad \iff \qquad p \in \partial f(x) \qquad \iff \qquad x \in \partial f^*(p) $

Proof: This follows from Theorem 2.2 and Proposition 3.1.

Corollary 3.3: If $f \in C^1 ( \mathbb{R}^n )$ is closed and convex, then

$ f(x) + f^*( \nabla f(x) ) = \langle \nabla f(x), x \rangle $ for all $x \in $ dom $f$

Proof: Since $f \in C^1( \mathbb{R}^n)$ and $f$ is convex, then $ f(y) \ge f(x) + \langle \nabla f(x), y - x \rangle \: \forall y \in \mathbb{R}^n $. So $ \nabla f(x) \in \partial f(x) $ and Proposition 3.1 implies the result.

Requiring that (3.1) be an equality is a sufficient condition for the existence of the Legendre transform of $f$ when $f$ is continuously differentiable:

Proposition 3.4: If $f \in C^1( \mathbb{R}^n )$ and

$ f(x) + f^*(p) = \langle p, x \rangle \qquad $ (3.2)


$ p = \nabla f(x) \quad $ and $ \quad x = \nabla f^*(p) $.

(This is the correspondence (1.1) obtained earlier.)

Proof: Take the gradients of equation (3.2) with respect to $x$ and $p$ to get the result.

Smoothness and strict convexity as conjugate properties

Let $ f: \mathbb{R}^n \rightarrow \mathbb{R} \cup \{ + \infty \} $ satisfy the assumptions in Remark 1.3. Smoothness and strict convexity (of a function) are conjugate properties in the sense that

(a) if $f$ is a closed, strictly convex function, then $f^*$ is continuously differentiable, and

(b) if $f$ is a closed, convex and differentiable function, then $f^*$ is strictly convex.

We acknowledge the following result:

Proposition 4.1: If $f$ is closed and convex, then $f$ is differentiable at a point $x$ in the interior of dom $f$ if and only if $ \partial f(x) $ is a singleton.

Proof: See Chapter D of [6].

We now state (a) and (b) more precisely:

Theorem 4.2: Let $f$ be a closed, strictly convex function such that dom $f^*$ has nonempty interior. Then $f^*$ is continuously differentiable on the interior of dom $f^*$.

Proof: By Proposition 4.1, it suffices to show that if $p$ in the interior of dom $f^*$, then $\partial f^*(p)$ is a singleton. Suppose towards a contradiction that there exists $p$ in the interior of dom $f^*$ such that $ \partial f^*(p)$ contains two distinct points $x_1$ and $x_2$. By Corollary 3.2, we have the equations

$ f^*(p) + f(x_i) = \langle p, x_i \rangle \quad $ for $i = 1, \: 2$

Adding $\alpha$ times the first equation to $(1 - \alpha )$ times the second, and using Young's inequality, we have

$ f^*(p) + \alpha f(x_1) + (1 - \alpha ) f(x_2) = \langle s, \alpha x_1 + (1 - \alpha ) x_2 \rangle \le f^*(p) + f( \alpha x_1 + (1 - \alpha ) x_2)$,

which implies that $f$ is affine on the line segment from $x_1$ to $x_2$, a contradiction. This completes the proof.

Theorem 4.3: Let $f$ be a closed, convex function that is differentiable on the set $\Omega := $ (interior of dom $f$). Then $f^*$ is strictly convex on each convex subset $C \subset \nabla f( \Omega )$.

Proof: Let $C$ be a convex set as stated above. Since $f^*$ is convex on $C$, it suffices to show that $f^*$ is not affine. Suppose that there are two distinct points $p_1$ and $p_2$ in $C$ such that $f^*$ is affine on the line segment from $p_1$ to $p_2$. Set $p = (p_1 + p_2) / 2 \in C \subset \nabla f( \Omega )$. Then there is $x \in \Omega$ such that $p = \nabla f(x)$. Using Corollary 3.2 and the affine character of $f$, we have

$ 0 = f(x) + f^*(p) - \langle p, x \rangle = \frac{1}{2} \displaystyle{\sum_{i=1}^2 [f(x) + f^*(p_i) - \langle p_i, x \rangle ]}$

Young's inequality implies that $f(x) + f^*(p_i) - \langle p_i, x \rangle = 0$ for $i = 1, \: 2$, so $x \in \partial f^*(p_1) \cap \partial f^*(p_2)$. Then $ \partial f(x) $ contains $p_1$ and $p_2$, which contradicts the existence of $ \nabla f(x)$. Therefore, $f^*$ is strictly convex on $C$.

Theorems 4.2 and 4.3 together describe a situation in which the Legendre transform is well-defined:

Corollary 4.4: Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be strictly convex, differentiable and 1-coercive. Then

(i) $f^*$ is also finite-valued on $\mathbb{R}^n $, strictly convex, differentiable and 1-coercive;

(ii) $ \nabla f$ is a continuous, one-to-one mapping from $ \mathbb{R}^n $ onto $ \mathbb{R}^n $, and $ (\nabla f)^{-1} $ is continuous;

(iii) $f^*(p) = \langle p, (\nabla f)^{-1} (p) \rangle - f( (\nabla f)^{-1} (p) ) $ for all $p \in \mathbb{R}^n$.

Geometric Interpretation

One-dimensional case

Let $ f \in C^2 (\mathbb{R}, \mathbb{R}) $ and suppose that $ f $ is strictly convex. Then $f$ is sufficiently well-behaved for all of the above considerations to apply. Any point $ (x, f(x)) $ on the graph of $ f$ is uniquely determined by the slope of the tangent line to the graph of $ \ f $ at $x$. This is equivalent to saying that the equation $ p = f ^\prime(x) $ can be inverted to obtain $ x $ as a function of $ p $. The Legendre transformation replaces the independent variable $ x $ with the variable $ p = f ^\prime(x) $, the slope of the tangent line at $ (x, f(x)) $. Letting $ f^*(p) = px - f(x) $ be the Legendre transform of $ f $, it follows that $ (0, - f ^{* \prime}(f ^\prime(x))) $ is the y-intercept of the tangent line at $ (x, f(x)) $. So the Legendre transform is a 1-1 correspondence between a point $ (x, f(x)) $ on the graph of $ f $ and the slope and y-intercept of the tangent line to the graph at $ (x, f(x)) $ i.e. a 1-1 correspondence between the graph of $ f $ and the set of its tangent lines.

Since the Legendre transformation is an involution on the set of strictly convex, differentiable functions, knowing only the slopes and y-intercepts of all tangent lines to the graph of $ f $ would be sufficient to reconstruct the function $ f $. The relevant equations for this reconstruction are

$ x = f ^{* \prime}(p), \quad f(x) = px - f^*(p) \quad $ (5.1)

where $ p$ is the slope of a tangent line and $ - f^*(p) $ is defined to be the y-coordinate of the tangent line's y-intercept.

Recall that the Legendre transform $ f^*$ of $ f$ can also be given by the formula

$ f^*(p) = \displaystyle{\sup_{x \in \mathbb{R}}} (px - f(x)) \quad $ (5.2)

This formula is consistent with our geometric interpretation. Let $ p \in \mathbb{R} $ and let $ x_0 $ be the unique point in $ \mathbb{R}$ such that $ p = f ^\prime(x_0)$. Since $ f$ is convex, any line with slope $ p$ that passes through a point $ (x, f(x)), \: x \ne x_0 $ will have a y-intercept higher than $ (0, - f^*(p)) $, so that - $ f^*(p)$ is the infimum of the y-intercepts of all such lines, i.e. (5.2) holds. By definition, $f^*(p)$ also gives the largest vertical distance between the line $y(x) = px$ and the graph of $f$ wherever $f < y$. The point $x = x(p)$ for which $f^*(p) = (y – f)(x(p))$ is just the point $x_0$ described earlier.

$n$-dimensional case

Suppose that $f \in C^2 ( \mathbb{R}^n, \mathbb{R} )$ and $f$ is strictly convex. Consider the computation of $f^*$ in the graph-space $ \mathbb{R}^n \times \mathbb{R} $. Given $p \in \mathbb{R}^n$, consider the family of affine functions $x \mapsto \langle p, x \rangle - r$, parametrized by $r \in \mathbb{R}$. They correspond to affine hyperplanes $H_{p,r}$ orthogonal to $(p, -1) \in \mathbb{R}^n \times \mathbb{R}$. $H_{p,r}$ lies below the graph of $f$ whenever $p \in $ dom $f^*$ and $r$ is large enough. To construct $f^*(p)$, we lift $H_{p, r}$ as much as possible subject to lying below the graph of $f$. When $H_{p,r}$ intersects the graph of $f$ at a single point $(x, f(x))$, we conclude that

$ \langle p, x \rangle - r = f(x) $

$\Rightarrow \; r = \langle p, x \rangle - f(x)$

$\Rightarrow \; r = f^*(p) \quad $ since by construction, $x$ is a local maximum of $\langle p, \cdot \rangle - f( \cdot )$.

Thus, $H_{p,r}$ intersects the vertical axis $ \{ 0 \} \times \mathbb{R}$ at the altitude $- f^*(p)$. The Legendre transformation is a 1-1 correspondence between the graph of $f$ and the family of tangent, affine hyperplanes.

We can express $f^*(p)$ in another way: for all $x \in \mathbb{R}^n$,

$ \begin{align} f^*(p) & = \displaystyle{\sup_x} [\langle p, x - f(x)] \\ & = \displaystyle{\sup_x} \displaystyle{\sup_{r \ge f(x)}} [\langle p, x \rangle - r] \\ & = \sup \{ \langle p, x \rangle - r: (x, r) \in \text{epi} \: f \} \end{align} $

where epi $f$ is the epigraph of $f$. In other words, of all affine hyperplanes orthogonal to $(p, -1)$, $-f^*(p)$ is the infimum of the $(n+1)^{th}$ coordinates of the intersection points of the hyperplanes with $\{ 0 \} \times \mathbb{R}$.

Other properties

The functions $f$, $f_j$ and $g$ appearing below are assumed to satisfy the assumptions in Remark 1.3.

Behaviour under translation

  • The conjugate of the function $g(x) = f(x) + r$ is $g^*(p) = f^*(p) - r$.
  • The conjugate of the function $g(x) = f(x - x_0)$ is $g^*(p) = f^*(p) + \langle p, x_0 \rangle $.
  • The conjugate of the function $g(x) = f(x) + \langle p_0, x \rangle $ is $g^*(p) = f^*(p - p_0)$.

Behaviour under scaling

  • For $t > 0$, the conjugate of the function $g(x) = tf(x)$ is $g^*(p) = t f^*(p / t)$.
  • For $t \ne 0$, the conjugate of the function $g(x) = f(tx)$ is $g^*(p) = f^*(p / t)$.

Composition with linear transformations

  • If $A$ is an invertible linear operator on $\mathbb{R}^n$, then $ (f \circ A)^* = f^* \circ (A^{-1})^*$, where $(A^{-1})^*$ is the adjoint of $A^{-1}$.
  • A closed, convex function $f$ is said to be symmetric with respect to an orthogonal linear operator $A$ on $\mathbb{R}^n$ if $f(Ax) = f(x)$ for all $x \in \mathbb{R}^n$. $f$ is symmetric with respect to $A$ if and only if $f^*$ is.

Monotonicity of conjugation: If $f_1 \le f_2$, then $f_1^* \ge f_2^*$.

Convexity of conjugation:

If dom $f_1 \cap$ dom $f_2 \ne \emptyset $ and $ 0 \le \alpha \le 1$, then

$ [ \alpha f_1 + (1 - \alpha ) f_2]^* \le \alpha f_1^* + (1 - \alpha ) f_2^* $

Preservation of decomposition

Let $\mathbb{R}^n := \mathbb{R}^{n_1} \times \cdots \times \mathbb{R}^{n_m}$. Let $ \langle \cdot, \cdot \rangle_j $ denote the inner product on $\mathbb{R}^{n_j}$, and let $ \langle \cdot, \cdot \rangle := \sum_{j=1}^m \langle \cdot, \cdot \rangle_j$ be an inner product on $\mathbb{R}^n$. Denote $x \in \mathbb{R}^n$ by $x = (x_1,...,x_m)$ where $x_j \in \mathbb{R}^{n_j}$. Let $f_j : \mathbb{R}^{n_j} \rightarrow \mathbb{R}$ satisfy the assumptions in Remark 1.3. Define $f: \mathbb{R}^n \rightarrow \mathbb{R}$ by $f(x) = \sum_{j=1}^m f_j(x_j)$. Then

$f^*(p_1,...,p_m) = \displaystyle{ \sum_{j=1}^m} f_j^*(p_j)$.


The Legendre-Fenchel transformation exhibits special behaviour on certain types of functions.

Parabolas and quadratic forms

(i) Let $ f(x)= mx^2 / 2$. Then $ p = mx$ and $ f^*(p) = p^2 / 4$.

(ii) Let $ A$ be a symmetric invertible n x n matrix. Let $ f(x) = \frac{1}{2} x^T A x $. Then $x = A^{-1} p$ and $ f^*(p) = \frac{1}{2} p^T A^{-1} p $.


  • The conjugate of a parabola (resp. quadratic form) is a parabola (resp. quadratic form).
  • In example (i), if $ x$ is the velocity of a particle, then $f(x)$ and $ f^*(p)$ both express the particle's kinetic energy in terms of the velocity and the momentum respectively.
  • An example of the Legendre transformation acting on a parabola is the following: let $ \vec x (t)$ be the position of a particle, let $ \dot{\vec x} = d \vec x / dt $ be its velocity, and let $ V(\vec x)$ be its potential energy. Then its Lagrangian is $ L(\vec x, \dot{\vec x}) = \frac{1}{2}m \dot{\vec x}^2 - V(\vec x) $. The Legendre transform of $ L$ (that replaces $ \dot{\vec x} $ with $ \vec p$) is the Hamiltonian of the particle, $ H = \vec p^2 / 2m + V(\vec x) $.

Homogeneous functions

Let $ f(x) = x^a / a $, where $a > 1$. Then $ f^*(p) = p^b / b $, where $ b$ is related to $ a $ by the formula $ 1 / a + 1 / b = 1 $.

Proof: Let $ a>1$.

$ p = f ^\prime(x) = x^{a-1} \qquad \Rightarrow \qquad x = p^{1/(a-1)} $

It follows that

$ f^*(p) = - x^a / a + px = - p^{a/(a-1)} / a + pp^{1/(a-1)} = p^{a/(a-1)} \left( 1 - 1 / a \right) $,

and therefore

$ f^*(p) = p^b / b $,

where $ b$ is related to $ a$ by the formula

$ 1 / a + 1 / b = 1 $.

Remark: In general, the conjugate of a homogeneous function of degree $a$ is a homogeneous function of degree $b$. When $a = b = 2$, we obtain the familiar result that a quadratic function’s conjugate is again a quadratic function.

Hamiltonian formulation of classical mechanics

The Legendre transformation can be used to derive the Hamiltonian formulation of classical mechanics from the Lagrangian formulation, which shifts the action functional dependence from generalized velocities to generalized momenta. It is assumed that the Hessian matrix of the Lagrangian, denoted $ \nabla_{vv}^2 L(t, q, v)$, is nonsingular, so that $L(t,q,v)$ is strictly convex with respect to $v$ and the Hamiltonian $H(t,q,p) = sup_{\dot{q}}\left \{ \langle p, q \rangle - L(t,q,\dot{q}(q,p)) \right \} $ is well-defined. $H(t,q,p)$ is the Legendre transform of $L(t,q,\dot{q})$ with respect to $\dot{q}$, and $p = (\partial L / \partial \dot{q})$ is the generalized momentum, often called the momentum conjugate to $q$. Using the fact that the Legendre transformation is an involution on convex, differentiable functions, one can demonstrate that Lagrangian Mechanics and Hamiltonian Mechanics are equivalent descriptions of the dynamics of a system.


[1] Arnold, V.I. (1989). Mathematical Methods of Classical Mechanics (second edition).

[2] Cannarsa, P. et al. (2004). Semiconcave functions, Hamilton-Jacobi equations, and optimal control.

[3] Carathéodory, C. et al. (2001). Advances in convex analysis and global optimization.

[4] Dacorogna, R. (2004). Introduction to the calculus of variations.

[5] Fomin, S.V. et al. (1991). Calculus of variations.

[6] Hiriart-Urruty, J.B. et al. (2001). Fundamentals of Convex Analysis.

[7] Magaril-Il'yaev G.G. et al. (2003). Convex Analysis: theory and applications.

[8] Legendre transformation. http://en.wikipedia.org/wiki/Legendre_transformation

Personal tools